Benjamin Nevarez Rotating Header Image

Parameter Sniffing

Parameter Sniffing and Plan-reuse-affecting SET Options

One interesting problem I am asked to troubleshoot sometimes is when a developer tells me that a stored procedure is timing out or taking too long to execute on a web application but returning immediately when executed directly in Management Studio. Even for the same parameters. Although there could be a few reasons for a problem like this to happen, including blocking, the most frequent one is related to a behavior in which the plan used by the web application was optimized using a combination of parameters which produced a “bad” plan for some other executions of the same stored procedure with different parameters. Although you may be tempted to just run sp_recompile to force a new optimization and allow the application to continue working, this does not really fix the problem and it may eventually come back. You could have also seen some similar scenarios where you have updated statistics, rebuild an index or changed something else to find out that suddenly the problem seems to be fixed. It is not. Those changes probably just forced a new optimization with the “good” parameter you were just testing. Obviously the best thing to do for this kind of problem is capturing the “bad” plan for further analysis in order to provide a permanent solution. In this post I will show you how to do that.

But first, a little bit of background. Remember that in general query optimization is an expensive operation and, in order to avoid this optimization cost, the plan cache will try to keep the generated execution plans in memory so they can be reused. So, if the stored procedure is executed thousands of times, only one optimization is needed. However, if a new connection running the same stored procedure has different SET options it may generate a new plan instead of reusing the one already on the plan cache. This new plan can be reused by later executions of the same stored procedure with the same connection settings. A new plan is needed as these SET options can impact the choice of an execution plan because they affect the results of evaluating constant expressions during the optimization process (a process known as constant folding and explained here). Another connection setting, FORCEPLAN, acts in a similar way to a hint, requesting the Query Optimizer both to preserve the join order as specified on the query syntax and to use nested loop joins only. As indicated in the Microsoft white paper Plan Caching in SQL Server 2008, the following SET options will affect the reuse of execution plans.

ANSI_NULL_DFLT_OFF
ANSI_NULL_DFLT_ON
ANSI_NULLS
ANSI_PADDING
ANSI_WARNINGS
ARITHABORT
CONCAT_NULL_YIELDS_NULL
DATEFIRST
DATEFORMAT
FORCEPLAN
LANGUAGE
NO_BROWSETABLE
NUMERIC_ROUNDABORT
QUOTED_IDENTIFIER

Unfortunately different management or development tools, like Management Studio, ADO.NET, or even sqlcmd, may have different SET options in their default configuration. You will find that mostly the problem is that one of the options, ARITHABORT, is OFF in ADO.NET and ON in Management Studio. So it may be possible that, in our example, Management Studio and the web application are using distinct cached plans and that the web application initially got a good plan for the parameters used during the optimization, but this plan is not good for some other executions of the same stored procedure with different parameters.

But now let us see how to prove that parameter sniffing is in fact the problem for your specific instance of the issue, and how to extract the plans to inspect both the parameters and the SET options used during optimization. Since AdventureWorks does not have the default SET options of a new database, let us create our own and copy some data from AdventureWorks

CREATE DATABASE Test
GO

Create a new table and a stored procedure to test

USE Test
GO
SELECT * INTO dbo.SalesOrderDetail
FROM AdventureWorks.Sales.SalesOrderDetail
GO
CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_ProductID 
ON dbo.SalesOrderDetail(ProductID)
GO
CREATE PROCEDURE test (@pid int)
AS
SELECT * FROM dbo.SalesOrderDetail
WHERE ProductID = @pid

Let us test two different applications, executing the stored procedure from a .NET application (C# code included at the end) and from Management Studio. For the purpose of this test we want to assume that a plan with a table scan is a bad plan and a plan using an index seek/RID lookup is the optimal one.

Start with a clean plan cache by running

DBCC FREEPROCCACHE

Run the .NET application from a command prompt window and provide the value 870 as a parameter (note that this application is only running the test stored procedure)

C:\TestApp\test
Enter ProductID: 870

At this moment we can start inspecting the plan cache to see the plans available in memory. Run the following script from the Test database (we will be running this script again later during this exercise)

SELECT plan_handle, usecounts, pvt.set_options
FROM (
    SELECT plan_handle, usecounts, epa.attribute, epa.value 
    FROM sys.dm_exec_cached_plans 
        OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
    WHERE cacheobjtype = 'Compiled Plan') AS ecpa 
PIVOT (MAX(ecpa.value) FOR ecpa.attribute IN ("set_options", "objectid")) AS pvt
where pvt.objectid = object_id('dbo.test')

You should get an output similar to this

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    1            251

The output shows that we have one execution plan in the plan cache, it has been used once (as indicated by the usecounts value), and the set_options value, taken from the sys.dm_exec_plan_attributes DMF, is 251. Since this was the first execution of the stored procedure, it was optimized using the parameter 870 which in this case created a plan using a table scan (consider here a “bad” plan). Now run the application again using a parameter that returns only a few records and will benefit from an index seek/RID lookup plan:

C:\TestApp\test
Enter ProductID: 898

If you inspect the plan cache again you will notice that the plan has been used twice and unfortunately this time it was not good for the second parameter used

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    2            251

At this moment the developer may try to troubleshoot this problem by running the stored procedure in Management Studio using something like this

EXEC test @pid = 898

Now the developer is surprised to find that SQL Server is returning a good execution plan and the query is returning immediately. Inspecting the plan cache again will show something similar to this

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    2            251
0x05000700210F0207B860650B000000000000000000000000    1            4347

You can see that a new plan was added for the Management Studio execution, with a different value for set_options.

What to do next? It is time to inspect the plans and look at the SET options and parameters used during the optimization. Select the plan_handle of the first plan created (the one with set_options 251 in your own example) and use it to run the following query

select * from sys.dm_exec_query_plan
(0x05000700210F0207B8C09007000000000000000000000000)

You can find the SET options at the beginning of the plan

<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" 
CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" 
ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" />

And the used parameters at the end

<ParameterList>
    <ColumnReference Column="@pid" ParameterCompiledValue="(870)" />
</ParameterList>

Do the same for the second plan and you will get the following information for the SET options

<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="true" 
CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" 
ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" />

and the following parameter information

<ParameterList>
    <ColumnReference Column="@pid" ParameterCompiledValue="(898)" />
</ParameterList>

This information shows that the ARITHABORT SET option has different value on these plans and that the parameter used to optimize the query on the web application was 870. (The same information is available from the Properties window of a graphical plan). You can also verify the operators used in the plan, the first one using a table scan and the second one an index seek/RID lookup combination.

Now that you have captured the plans you can force a new optimization so the application can use a better plan immediately (keeping in mind that this is not a permanent solution). Try this

sp_recompile test

So now you know that you have a problem related to parameter sniffing. What to do next? I have a few recommendations on previous posts here and here. I have another one here but usually you should not be doing this.

Finally, you can use the following script to display SET options for a specific set_options value

declare @set_options int = 251
if ((1 & @set_options) = 1) print 'ANSI_PADDING'
if ((4 & @set_options) = 4) print 'FORCEPLAN'
if ((8 & @set_options) = 8) print 'CONCAT_NULL_YIELDS_NULL'
if ((16 & @set_options) = 16) print 'ANSI_WARNINGS'
if ((32 & @set_options) = 32) print 'ANSI_NULLS'
if ((64 & @set_options) = 64) print 'QUOTED_IDENTIFIER'
if ((128 & @set_options) = 128) print 'ANSI_NULL_DFLT_ON'
if ((256 & @set_options) = 256) print 'ANSI_NULL_DFLT_OFF'
if ((512 & @set_options) = 512) print 'NoBrowseTable'
if ((4096 & @set_options) = 4096) print 'ARITH_ABORT'
if ((8192 & @set_options) = 8192) print 'NUMERIC_ROUNDABORT'
if ((16384 & @set_options) = 16384) print 'DATEFIRST'
if ((32768 & @set_options) = 32768) print 'DATEFORMAT'
if ((65536 & @set_options) = 65536) print 'LanguageID'

C# Code

using System;
using System.Data;
using System.Data.SqlClient;

class Test
{
    static void Main()
    {
        SqlConnection cnn = null;
        SqlDataReader reader = null;
        
        try
        {
            Console.Write("Enter ProductID: ");
            string pid = Console.ReadLine();

            cnn = new SqlConnection("Data Source=(local);Initial Catalog=Test;
                Integrated Security=SSPI");
            SqlCommand cmd = new SqlCommand();
            cmd.Connection = cnn;
            cmd.CommandText = "dbo.test";
            cmd.CommandType = CommandType.StoredProcedure;
            cmd.Parameters.Add("@pid", SqlDbType.Int).Value = pid;
            cnn.Open();
            reader = cmd.ExecuteReader();
            while (reader.Read())
            {
                Console.WriteLine(reader[0]);
            }
            return;
        }
        catch (Exception e)
        {
            throw e;
        }
        finally
        {
            if (cnn != null)
            {
                if (cnn.State != ConnectionState.Closed)
                    cnn.Close();
            }
        }
    }
}

Speaking at the PASS Summit and other Southern California events

I am currently working on the two sessions that I will be presenting at the PASS Summit: Inside the SQL Server Query Optimizer and Parameter Sniffing: the Query Optimizer vs. the Plan Cache. In addition, I will be presenting these two new sessions in other SQL Server events in Southern California including SQLSatuday #95.

First, I will be speaking at the Los Angeles SQL Server Professionals Group on Thursday September 15th. The meeting will be hosted at the UCLA Anderson School of Management and will start at 6:30 PM. I will present only one session, Inside the SQL Server Query Optimizer, in this meeting. You can find additional information about the meeting and directions on their website.

Two days later, on September 17th, I will be speaking at SQLSaturday #95 in San Diego, CA. Of course, this SQLSaturday will also have many other great speakers and the final schedule is already posted here. In addition to presenting both of my sessions described before I will be participating in the Ask the Experts – SQL Server Q&A session coordinated by Thomas Mueller. For more details and directions for SQLSaturday #95 please go to their website here.

On October 7th I will be presenting my Query Optimizer session at the Orange County SQL Server Professionals User Group in Mission Viejo, CA. Details and directions will be posted soon on their website here.

Then it is time for the PASS Summit, the largest SQL Server and BI conference in the world. The PASS Summit is hosted again this year in Seattle, WA and it is scheduled for October 11-14. The schedule for my two sessions is not final at the moment of writing this but so far it looks like I will be speaking on Wednesday and Friday.

I am flying back from the PASS Summit on Saturday and planning to present my two sessions at the SoCal Code Camp the following day, Sunday October 16th. The SoCal Code Camp is a community driven event for developers to come and learn from their peers. At this moment they are still accepting sessions so no schedule has been created yet. You can register, find additional information and directions on their website here.

Finally, although I am not going to be speaking, I will be attending SQL in the City in Los Angeles, CA on October 28th. SQL in the City is a one day SQL Server training event which will include several SQL Server MVPs and you can look at their site here for more details and information.

I look forward to meeting lots of SQL Server professionals on these events.

Disabling Parameter Sniffing?

As I mentioned in a previous post, parameter sniffing is a good thing: it allows you to get an execution plan tailored to the current parameters of your query. Of course, sometimes it can also be a problem but there are some solutions available. Some of these solutions are covered in my posts here and here.

However, Microsoft recently released a cumulative update which provides a trace flag to disable parameter sniffing at the instance level. This cumulative update is available for the latest versions of SQL Server as described on the knowledge base article 980653.

Basically this trace flag, 4136, has the effect of disabling the use of histograms, a behavior similar to the use of the OPTIMIZE FOR UNKNOWN hint. There are still three cases where this trace flag has no effect, as described in the previous knowledge base article, which are on queries using the OPTIMIZE FOR or RECOMPILE query hints and on stored procedures using the WITH RECOMPILE option.

In general I would not recommend using this trace flag and would ask you to try the other solutions available instead. But anyway, it is good to know that this choice exists and can be used in cases when you really need it. It should be used carefully and only when enough testing shows that in fact it improves the performance of your application.

But let us test it to see how it works. I am testing it with SQL Server 2008 R2. My original build is 10.50.1600. After the cumulative update is installed the build is 10.50.1720.

Let us use the same example described on my OPTIMIZE FOR UNKNOWN post so perhaps you want to refer to it to better understand the details. Create the following stored procedure on the AdventureWorks database.

CREATE PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

Executing the stored procedure before the cumulative update, or after the cumulative update but without using the flag

EXEC test @pid = 709

shows the following plan

clip_image002

In this case, since the trace flag is not yet in effect, SQL Server uses the statistics histogram to estimate the number of rows which in this case is 188. After I enable the trace flag, restart my SQL Server instance, and run the same stored procedure again I got the following plan where the estimated number of rows is now 456.079. Again, how these values were obtained was explained in my previous post.

clip_image004

Let us try a test using the OPTIMIZE FOR query hint, which ignores the 4136 trace flag (note that it is not the same as the OPTIMIZE FOR UNKNOWN hint) by using the following code.

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (OPTIMIZE FOR (@pid = 709))

If you try this version of the stored procedure, even with the trace flag enabled, it will use the histogram again and will create a plan using the estimated number of rows of 188.

Finally, if you followed this exercise, do not forget to remove the trace flag and restart your SQL Server service.

How OPTIMIZE FOR UNKNOWN Works

One of the questions I have been asked several times is about how OPTIMIZE FOR UNKNOWN works. OPTIMIZE FOR is a query hint introduced with SQL Server 2005 that can help with the parameter sniffing problem and requires from you to specify a value for a parameter. For an introduction to the parameter sniffing problem you can see my previous post here. On the other hand, OPTIMIZE FOR UNKNOWN, which was introduced in SQL Server 2008, does not require from you to specify a value for a parameter.

A traditional way to avoid the parameter sniffing problem, especially in previous versions of SQL Server, was by using local variables. But entirely avoiding parameter sniffing does not mean that it is always a good solution. As I mentioned in my previous article, from the point of view of query optimization, parameter sniffing is a good thing. When the Query Optimizer knows the value of a parameter it can use the statistics histogram to estimate the number of records that can be returned by a query. Using the histogram will give you the best estimate possible. But when you use local variables SQL Server is not able to use the histogram anymore. Instead it uses the information on the density vector of the statistics object. OPTIMIZE FOR UNKNOWN works pretty much in the same way.

To better understand how OPTIMIZE FOR UNKNOWN works let us first see the case when a parameter value is known. Create the following stored procedure

CREATE PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

Running this stored procedure and requesting a plan shows 188 estimated records which can be seen on the following execution plan which uses both an index seek and a key lookup operators.

EXEC test @pid = 709

clip_image002

In this case SQL Server is able to use the histogram and estimate that 188 records would be returned. The Query Optimizer uses that estimate to take a decision about the plan to generate. Use the following statement to inspect the statistics object used by this stored procedure.

DBCC SHOW_STATISTICS(‘Sales.SalesOrderDetail’, IX_SalesOrderDetail_ProductID)

Running the statement shows the following information (for space limitations, only the information needed for this post is displayed, including the first line of the density vector and the first three lines of the histogram).

All density   Average Length Columns

0.003759399   4              ProductID

 

RANGE_HI_KEY RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS 

707          0             3083          0                   

708          0             3007          0                   

709          0             188           0                   

In this case SQL Server used the histogram to find the value of ProductID 709, defined as RANGE_HI_KEY, and finds the estimated number of rows 188, defined as EQ_ROWS.

Let us now change the stored procedure to use local variables

ALTER PROCEDURE test (@pid int)

AS

DECLARE @lpid int

SET @lpid = @pid

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @lpid

Run the procedure again using the same value

EXEC test @pid = 709

This time we get a different plan, using a clustered index scan, as shown next

clip_image004

Local variables are not known at optimization time so the Query Optimizer is not able to use the value 709 for the optimization, as it did before. Actually, this time it does not matter which value you use, you always will get the same estimated number of rows and the same plan. The estimated number of rows is 456.079.

Now use the version of the stored procedure with OPTIMIZE FOR UNKNOWN

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (OPTIMIZE FOR UNKNOWN)

You will notice that it is behaving the same as with local variables, it is getting an estimated number of rows of 456.079 and getting the same plan, using a clustered index scan.

But now let us see how SQL Server is obtaining the value 456.079 and what the reasoning behind this is.

Density is defined as 1 / number of distinct values. The SalesOrderDetail table has 266 distinct values for ProductID, so the density is calculated as 1 / 266 or 0.003759399 as shown before on the statistics object. One assumption in the statistics mathematical model used by SQL Server is the uniformity assumption. Since in this case SQL Server can not use the histogram, the uniformity assumption tells that for any given value the data distribution is the same. To obtain the estimated number of records SQL Server will multiply the density by the current total number of records, 0.003759399 * 121,317 or 456.079, as shown on the plan. This is also the same as to divide the total number of records by the number of distinct values, 121,317 / 266, which also gives 456.079.

Finally, I would conclude saying that the benefit of using OPTIMIZE FOR UNKNOWN is that you always get the same execution plan, just make sure that the chosen plan benefits most of the instances of your query.

The Parameter Sniffing Problem

SQL Server can use the histogram of a statistics object to estimate the cardinality of a predicate and use this information to produce an optimal execution plan. The Query Optimizer accomplishes this by first inspecting the values of the query parameters. This behavior is called parameter sniffing and it is a good thing: getting an execution plan tailored to the current parameters of a query improves the performance of your applications. We also know that the plan cache can store these execution plans so they can be reused the next time the same query needs to be executed again. This saves optimization time and CPU resources as the query does not need to be compiled again.

However, although the Query Optimizer and the plan cache work fine together most of the times, occasionally some performance problems can appear. Since the Query Optimizer can produce more than one execution plan for the same query, depending on the value of its parameters, caching and reusing only one of these plans may be a performance issue for an instance of this query that could benefit from a better plan. This is a known problem with queries using explicit parameterization which is for example the case of the stored procedures. Let me show you an example of what the problem is and a few recommendations on how to fix it. Create the following stored procedure on the AdventureWorks database

CREATE PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

Run the following statement to execute the stored procedure and request to display its execution plan

EXEC test @pid = 897

The Query Optimizer estimates that only a few records will be returned by this query and produces the following execution plan which uses an index seek operator, to quickly find the records using an existing nonclustered index; and a key lookup operator, to search on the base table for the remaining columns requested on the query.

clip_image002

This index seek-key lookup combination is a good choice in this case because the query is highly selective. Let us run the same stored procedure again with a different ProductID value. For the following test, include a SET STATISTICS ION statement to display the amount of disk activity generated by the query

SET STATISTICS IO ON

GO

EXEC test @pid = 870

GO

On this second execution SQL Server is performing more than 14,000 reads when the SalesOrderDetail table has only around 1,240 pages. It is using more than 10 times more I/Os than simply scanning the entire table. So why is the second execution performing poorly? When SQL Server was executed the second time, it was using the same execution plan that was generated for the first execution. And for this particular case, the plan generated was optimal for the parameter used in the first query but not for the one on the second query. Performing both an index seek and a key lookup, which are together called a bookmark lookup, is an expensive operation which can have performance benefits but only for a limited number of records.

Now clear the plan cache to remove the current execution plan from memory and run the stored procedure again using the same previous parameter (note that there are better ways to remove a plan from the plan cache so use this statement only on a test environment).

DBCC FREEPROCCACHE

GO

EXEC test @pid = 870

GO

Since this time there was no available plan for this stored procedure on the plan cache, SQL Server optimized it using the ProductID 870 and created a new optimal execution plan for this value. The new plan will use a clustered index scan as shown next.

clip_image004

If you look at the I/O information you will notice that this time it is using only around 1,240 pages, which is the total number of pages on the table. This is a lot smaller compared with the more than 14,000 pages used in the previous execution with the same parameter.

Of course this does not mean that it is something wrong with using stored procedures. This is a problem that you need to be aware of and research especially if you have queries where performance changes dramatically when different parameters are introduced. If you happen to have this problem there are a few choices available.

 

Optimize for a Typical Parameter

There might be cases when most of the executions of a query can benefit of the same execution plan and you want to minimize the cost of optimizations. For these cases you can use the OPTIMIZE FOR hint, which was introduced with SQL Server 2005. Use this hint when an optimal plan can be generated for the majority of the possible values. Only the few executions using an atypical parameter will not have an optimal plan.

Suppose that almost all of the executions of our stored procedure would benefit from the plan that uses both an index seek and a key lookup operators. You could write the stored procedure this way

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (OPTIMIZE FOR (@pid = 897))

When you run the stored procedure for the first time, it will be optimized for the value 897, no matter what parameter value was specified for the execution. The plan will be stored on the plan cache and used for every execution of the stored procedure after that. Test the following case

EXEC test @pid = 870

You can find the following entry close to the end in the XML plan which shows that the value 897 was in fact used during the optimization

<ParameterList>

<ColumnReference Column="@pid" ParameterCompiledValue="(897)" ParameterRuntimeValue="(870)" />

</ParameterList>

The benefit of using this hint is that you have total control on which plan is stored on the plan cache and reused later. On the original problem you did not have control of which of the two execution plans would be stored on the plan cache.

 

Optimize on Every Execution

If you can not find a typical parameter o plan for your queries you can optimize for every execution. You will get an optimal plan on every execution but will pay for the optimization cost. To do this use the RECOMPILE hint as shown here.

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (RECOMPILE)

Run the store procedure using

EXEC test @pid = 897

The XML plan for this execution will show

<ParameterList>

<ColumnReference Column="@pid" ParameterCompiledValue="(897)" ParameterRuntimeValue="(897)" />

</ParameterList>

This time the query will be optimized on each execution and will get an optimal plan depending on current value of its parameters.

Some other solution that has been traditionally implemented, especially in older versions of SQL Server, is to simply avoid the parameter sniffing behavior by the use of local variables. Next, I will explain the pros and cons of using local variables along with the OPTIMIZE FOR UNKNOWN hint, which is new in SQL Server 2008. But since I need to go to bed now so I can wake up very early in the morning to see the first game of the 2010 World Cup, I will leave these topics for my next post. By the way, who is your favorite team?