Benjamin Nevarez Rotating Header Image

Query Optimizer

The Parameter Sniffing Problem

SQL Server can use the histogram of a statistics object to estimate the cardinality of a predicate and use this information to produce an optimal execution plan. The Query Optimizer accomplishes this by first inspecting the values of the query parameters. This behavior is called parameter sniffing and it is a good thing: getting an execution plan tailored to the current parameters of a query improves the performance of your applications. We also know that the plan cache can store these execution plans so they can be reused the next time the same query needs to be executed again. This saves optimization time and CPU resources as the query does not need to be compiled again.

However, although the Query Optimizer and the plan cache work fine together most of the times, occasionally some performance problems can appear. Since the Query Optimizer can produce more than one execution plan for the same query, depending on the value of its parameters, caching and reusing only one of these plans may be a performance issue for an instance of this query that could benefit from a better plan. This is a known problem with queries using explicit parameterization which is for example the case of the stored procedures. Let me show you an example of what the problem is and a few recommendations on how to fix it. Create the following stored procedure on the AdventureWorks database

CREATE PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

Run the following statement to execute the stored procedure and request to display its execution plan

EXEC test @pid = 897

The Query Optimizer estimates that only a few records will be returned by this query and produces the following execution plan which uses an index seek operator, to quickly find the records using an existing nonclustered index; and a key lookup operator, to search on the base table for the remaining columns requested on the query.

clip_image002

This index seek-key lookup combination is a good choice in this case because the query is highly selective. Let us run the same stored procedure again with a different ProductID value. For the following test, include a SET STATISTICS ION statement to display the amount of disk activity generated by the query

SET STATISTICS IO ON

GO

EXEC test @pid = 870

GO

On this second execution SQL Server is performing more than 14,000 reads when the SalesOrderDetail table has only around 1,240 pages. It is using more than 10 times more I/Os than simply scanning the entire table. So why is the second execution performing poorly? When SQL Server was executed the second time, it was using the same execution plan that was generated for the first execution. And for this particular case, the plan generated was optimal for the parameter used in the first query but not for the one on the second query. Performing both an index seek and a key lookup, which are together called a bookmark lookup, is an expensive operation which can have performance benefits but only for a limited number of records.

Now clear the plan cache to remove the current execution plan from memory and run the stored procedure again using the same previous parameter (note that there are better ways to remove a plan from the plan cache so use this statement only on a test environment).

DBCC FREEPROCCACHE

GO

EXEC test @pid = 870

GO

Since this time there was no available plan for this stored procedure on the plan cache, SQL Server optimized it using the ProductID 870 and created a new optimal execution plan for this value. The new plan will use a clustered index scan as shown next.

clip_image004

If you look at the I/O information you will notice that this time it is using only around 1,240 pages, which is the total number of pages on the table. This is a lot smaller compared with the more than 14,000 pages used in the previous execution with the same parameter.

Of course this does not mean that it is something wrong with using stored procedures. This is a problem that you need to be aware of and research especially if you have queries where performance changes dramatically when different parameters are introduced. If you happen to have this problem there are a few choices available.

 

Optimize for a Typical Parameter

There might be cases when most of the executions of a query can benefit of the same execution plan and you want to minimize the cost of optimizations. For these cases you can use the OPTIMIZE FOR hint, which was introduced with SQL Server 2005. Use this hint when an optimal plan can be generated for the majority of the possible values. Only the few executions using an atypical parameter will not have an optimal plan.

Suppose that almost all of the executions of our stored procedure would benefit from the plan that uses both an index seek and a key lookup operators. You could write the stored procedure this way

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (OPTIMIZE FOR (@pid = 897))

When you run the stored procedure for the first time, it will be optimized for the value 897, no matter what parameter value was specified for the execution. The plan will be stored on the plan cache and used for every execution of the stored procedure after that. Test the following case

EXEC test @pid = 870

You can find the following entry close to the end in the XML plan which shows that the value 897 was in fact used during the optimization

<ParameterList>

<ColumnReference Column="@pid" ParameterCompiledValue="(897)" ParameterRuntimeValue="(870)" />

</ParameterList>

The benefit of using this hint is that you have total control on which plan is stored on the plan cache and reused later. On the original problem you did not have control of which of the two execution plans would be stored on the plan cache.

 

Optimize on Every Execution

If you can not find a typical parameter o plan for your queries you can optimize for every execution. You will get an optimal plan on every execution but will pay for the optimization cost. To do this use the RECOMPILE hint as shown here.

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (RECOMPILE)

Run the store procedure using

EXEC test @pid = 897

The XML plan for this execution will show

<ParameterList>

<ColumnReference Column="@pid" ParameterCompiledValue="(897)" ParameterRuntimeValue="(897)" />

</ParameterList>

This time the query will be optimized on each execution and will get an optimal plan depending on current value of its parameters.

Some other solution that has been traditionally implemented, especially in older versions of SQL Server, is to simply avoid the parameter sniffing behavior by the use of local variables. Next, I will explain the pros and cons of using local variables along with the OPTIMIZE FOR UNKNOWN hint, which is new in SQL Server 2008. But since I need to go to bed now so I can wake up very early in the morning to see the first game of the 2010 World Cup, I will leave these topics for my next post. By the way, who is your favorite team?

Halloween Protection

Halloween protection refers to a problem found on update operations more than thirty years ago at the IBM Almaden Research Center in San Jose, California. Researchers were testing an optimizer when they ran a query to update the salary column on an Employee table. The query was supposed to give a 10 percent raise to every employee with salary less than $25,000 but to their surprise no employee had a salary under $25,000 after the update query was completed. They noticed that the optimizer had selected the salary index and had updated some records more than once, until they reached the $25,000 salary. Since the salary index was used to scan the records, when the salary column was updated, some records were moved within the index and were scanned again later, updating those records more than once. The problem was called Halloween protection as it was discovered in Halloween, around 1976 or 1977.

Update operations have a read section followed by an update section. To avoid the Halloween problem the read and update sections must be separated, the read section must be completed on its entirety before running the write section. I will show you how SQL Server avoids that problem in the next example. Run the following statement to create a new table on the AdventureWorks database

SELECT * INTO dbo.Product

FROM Production.Product

Run the following UPDATE statement which produces the next execution plan

UPDATE dbo.Product

SET ListPrice = ListPrice * 1.2

clip_image002

In this case no Halloween protection is needed; the statement updates the ListPrice column which is not part of any index, so updating the data does not move any row. Now, to show the problem I will create a clustered index on ListPrice column

CREATE CLUSTERED INDEX cix ON dbo.Product(ListPrice)

Run the same UPDATE statement again. The query will show a similar plan but this time including a table spool, which is a blocking operator, to separate the read from the write section. A blocking operator is needed as it has to read all the rows before producing any output row to the next operator. In the example the table spool separates the clustered index scan from the clustered index update, as shown on the next execution plan. The spool operator scans the original data and saves a copy of it in a hidden spool table in tempdb before it is updated. Usually a table spool operator is used to avoid the Halloween problem as it is a cheap operator, however, if the plan already includes another operator that can be used, for example a sort, the table spool operator is not needed and the sort can perform the same job instead.

clip_image004

Finally drop the table you have just created

DROP TABLE dbo.Product