 Cost Estimation

An Introduction to Cost Estimation

Last year when I presented my session regarding the Query Optimizer at the PASS Summit, I was asked how the estimated CPU and I/O costs in an execution plan are calculated, that is, where a specific value like 1.13256 is coming from. All I was able to say at the moment was that Microsoft does not publish how these costs are calculated.

Since this time I am working on a related project, I thought that perhaps I could look into this question again and show one example. But since there are dozens of operators, I decided to take a look at a simple one: the Clustered Index Scan operator. So I captured dozens of XML plans, used XQuery to extract their cost information and after some analysis I was able to obtain a basic formula for this specific operator.

But first a quick introduction to cost estimation: the cost of each operator depends on its algorithm, each operator is associated with a CPU cost, and some of them will also have an I/O cost. The total cost of the operator is the sum of these two costs. An operator like a Clustered Index Scan has both CPU and I/O costs. Some other operators, like a Stream Aggregate, will have only CPU cost. It is interesting to note that this cost used to mean the estimated time in seconds that a query or operator would take to execute on a particular reference machine. In recent versions of SQL Server this cost should no longer be interpreted as seconds, milliseconds, or any other unit.

To show the example, let us look at the largest table in AdventureWorks, Sales.SalesOrderDetail. Run the following query and look at the estimated CPU and I/O costs for the Clustered Index Scan operator as shown on the next figure.

SELECT * FROM Sales.SalesOrderDetail

WHERE LineTotal = 35 For a Clustered Index Scan operator, I observed that the CPU cost is 0.0001581 for the first record, plus 0.0000011 for any additional record after that. In this specific case we have an estimated number of records of 121,317, as shown on the picture above, so we can use 0.0001581 + 0.0000011 * (121317 – 1) or 0.133606 which is the value shown as Estimated CPU Cost. In a similar way, I noticed that the minimum I/O cost is 0.003125 for the first database page and then it grows in increments of 0.00074074 for every additional page. Since the Clustered Index Scan operator scans the entire table, I can use the following query to find the number of database pages, which returns 1,234.

SELECT in_row_data_page_count, row_count

FROM sys.dm_db_partition_stats

WHERE object_id = object_id(‘Sales.SalesOrderDetail’)

AND index_id = 1

In this case we have 0.003125 + 0.00074074 * (1234 – 1) or 0.916458 which is the value shown as estimated I/O Cost.

Finally, we add both costs, 0.133606 + 0.916458 to get 1.05006 which is the total estimated cost of the operator. In the same way, adding the cost of all the operators will give the total cost of the plan. In this case, the cost of the Clustered Index Scan, 1.05006, plus the cost of the first Compute Scalar operator, 0.01214, the second Compute Scalar operator, 0.01213, and the cost of the Filter operator, 0.0582322, will give the total cost of the plan, 1.13256, as shown next. 