Another interesting topic that I usually talk about on my presentations is statistics on computed columns so I will use this post to show you how they work and how they can help you to improve the performance of your queries.
A problem faced by some queries using scalar expressions is that they usually cannot benefit from statistics and, without them, the Query Optimizer will use the 30% selectivity guess on inequality comparisons. A solution to this problem can be the use of computed columns, as SQL Server can automatically create and update statistics on these columns which can help the Query Optimizer to create better execution plans. An additional benefit of this solution is that you don’t need to specify the name of the computed column in your queries for SQL Server to use its statistics. The Query Optimizer automatically matches the computed column definition to an existing scalar expression in a query, so your applications do not need to be changed. Although computed columns have been available in previous versions of SQL Server, the automatic matching feature was only introduced with SQL Server 2005.
To see an example, run this query, which creates the plan shown next:
SELECT * FROM Sales.SalesOrderDetail WHERE OrderQty * UnitPrice > 25000
The estimated number of rows is 36,395.1, which is 30% of the total number of rows, 121,317, although the query returns only 5 records. SQL Server is obviously using a selectivity guess, as it cannot estimate the selectivity of the expression OrderQty * UnitPrice > 25000.
Now create a computed column:
ALTER TABLE Sales.SalesOrderDetail ADD cc AS OrderQty * UnitPrice
Run the previous SELECT statement again and note that, this time, the estimated number of rows has changed to 84.3101 which is very close to the actual number of rows returned by the query, as shown in the following plan:
You can optionally test replacing the 25,000 in the query with some other values, like 1,000, 10,000, or 20,000 and verify that the estimated again will be close to the actual number of rows returned.
Note that creating the computed column does not create statistics; these statistics are created the first time that the query is optimized, and you can run the next query to display the information about the statistics objects for the Sales.SalesOrderDetail table:
SELECT * FROM sys.stats WHERE object_id = object_id('Sales.SalesOrderDetail')
The newly created statistics object will most likely be at the end of the list. Copy its name and use the following command to display the details about the statistics object (I’ve used the name of my local object, but you should replace that as appropriate). You can also use "cc" as the name of the object to get the same results. In both cases, the "cc" column should be shown on the Columns field in the density section.
DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail', _WA_Sys_0000000C_2645B050)
Unfortunately, for the automatic matching feature to work, the expression must be exactly the same as the computed column definition. So, if I change the query to UnitPrice * OrderQty, instead of OrderQty * UnitPrice, the execution plan will show an estimated number of rows of 30% again, as this query will demonstrate:
SELECT * FROM Sales.SalesOrderDetail WHERE UnitPrice * OrderQty > 25000
As mentioned, the computed column provides statistics so the Query Optimizer can try to get you a better execution plan. In addition, you can create an index on the existing computed column to provide a better navigational alternative. Create the following index
CREATE INDEX IX_cc on Sales.SalesOrderDetail(cc)
By running the original SELECT statement again the Query Optimizer will now choose the newly created index and will produce a more efficient plan using an Index Seek/Key Lookup instead of a Clustered Index Scan, as shown next.
DROP INDEX Sales.SalesOrderDetail.IX_cc GO ALTER TABLE Sales.SalesOrderDetail DROP COLUMN cc