Benjamin Nevarez Rotating Header Image

How OPTIMIZE FOR UNKNOWN Works

One of the questions I have been asked several times is about how OPTIMIZE FOR UNKNOWN works. OPTIMIZE FOR is a query hint introduced with SQL Server 2005 that can help with the parameter sniffing problem and requires from you to specify a value for a parameter. For an introduction to the parameter sniffing problem you can see my previous post here. On the other hand, OPTIMIZE FOR UNKNOWN, which was introduced in SQL Server 2008, does not require from you to specify a value for a parameter.

A traditional way to avoid the parameter sniffing problem, especially in previous versions of SQL Server, was by using local variables. But entirely avoiding parameter sniffing does not mean that it is always a good solution. As I mentioned in my previous article, from the point of view of query optimization, parameter sniffing is a good thing. When the Query Optimizer knows the value of a parameter it can use the statistics histogram to estimate the number of records that can be returned by a query. Using the histogram will give you the best estimate possible. But when you use local variables SQL Server is not able to use the histogram anymore. Instead it uses the information on the density vector of the statistics object. OPTIMIZE FOR UNKNOWN works pretty much in the same way.

To better understand how OPTIMIZE FOR UNKNOWN works let us first see the case when a parameter value is known. Create the following stored procedure

CREATE PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

Running this stored procedure and requesting a plan shows 188 estimated records which can be seen on the following execution plan which uses both an index seek and a key lookup operators.

EXEC test @pid = 709

clip_image002

In this case SQL Server is able to use the histogram and estimate that 188 records would be returned. The Query Optimizer uses that estimate to take a decision about the plan to generate. Use the following statement to inspect the statistics object used by this stored procedure.

DBCC SHOW_STATISTICS(‘Sales.SalesOrderDetail’, IX_SalesOrderDetail_ProductID)

Running the statement shows the following information (for space limitations, only the information needed for this post is displayed, including the first line of the density vector and the first three lines of the histogram).

All density   Average Length Columns

0.003759399   4              ProductID

 

RANGE_HI_KEY RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS 

707          0             3083          0                   

708          0             3007          0                   

709          0             188           0                   

In this case SQL Server used the histogram to find the value of ProductID 709, defined as RANGE_HI_KEY, and finds the estimated number of rows 188, defined as EQ_ROWS.

Let us now change the stored procedure to use local variables

ALTER PROCEDURE test (@pid int)

AS

DECLARE @lpid int

SET @lpid = @pid

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @lpid

Run the procedure again using the same value

EXEC test @pid = 709

This time we get a different plan, using a clustered index scan, as shown next

clip_image004

Local variables are not known at optimization time so the Query Optimizer is not able to use the value 709 for the optimization, as it did before. Actually, this time it does not matter which value you use, you always will get the same estimated number of rows and the same plan. The estimated number of rows is 456.079.

Now use the version of the stored procedure with OPTIMIZE FOR UNKNOWN

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (OPTIMIZE FOR UNKNOWN)

You will notice that it is behaving the same as with local variables, it is getting an estimated number of rows of 456.079 and getting the same plan, using a clustered index scan.

But now let us see how SQL Server is obtaining the value 456.079 and what the reasoning behind this is.

Density is defined as 1 / number of distinct values. The SalesOrderDetail table has 266 distinct values for ProductID, so the density is calculated as 1 / 266 or 0.003759399 as shown before on the statistics object. One assumption in the statistics mathematical model used by SQL Server is the uniformity assumption. Since in this case SQL Server can not use the histogram, the uniformity assumption tells that for any given value the data distribution is the same. To obtain the estimated number of records SQL Server will multiply the density by the current total number of records, 0.003759399 * 121,317 or 456.079, as shown on the plan. This is also the same as to divide the total number of records by the number of distinct values, 121,317 / 266, which also gives 456.079.

Finally, I would conclude saying that the benefit of using OPTIMIZE FOR UNKNOWN is that you always get the same execution plan, just make sure that the chosen plan benefits most of the instances of your query.

About the author

Benjamin Nevarez Benjamin Nevarez is a SQL Server MVP and independent consultant based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of "SQL Server 2014 Query Tuning & Optimization" and "Inside the SQL Server Query Optimizer" and co-author of "SQL Server 2012 Internals". With more than 20 years of experience in relational databases, Benjamin has also been a speaker at many SQL Server conferences, including the PASS Summit, SQL Server Connections and SQLBits. Benjamin’s blog can be found at http://www.benjaminnevarez.com and he can also be reached by e-mail at admin at benjaminnevarez dot com and on twitter at @BenjaminNevarez.

14 Comments

  1. […] This post was mentioned on Twitter by Benjamin Nevarez, Ben Nevarez. Ben Nevarez said: New blog post: How OPTIMIZE FOR UNKNOWN Works http://www.benjaminnevarez.com/2010/06/how-optimize-for-unknown-works […]

  2. Sankar Reddy says:

    Ben,

    This is excellent and thanks for sharing with a good explanation.

  3. Benjamin Nevarez says:

    Thanks Sankar. Nice meeting you at the SQLSaturday back in April.

    Regards,

    Ben

  4. Great information! I’ve been looking for something like this for a while now. Thanks!

  5. […] As I mentioned in a previous post, parameter sniffing is a good thing: it allows you to get an execution plan tailored to the current parameters of your query. Of course, sometimes it can also be a problem but there are some solutions available. Some of these solutions are covered in my posts here and here. […]

  6. mark lancour says:

    I’m having trouble understanding the difference between “RECOMPILE” AND “OPTIMIZE FOR UNKNOWN”. Can you give me a high overview of when to use one over the other?

  7. Benjamin Nevarez says:

    Hi Mark,

    I cover that topic in my previous post about parameter sniffing. Please take a look at it here

    http://www.benjaminnevarez.com/2010/06/the-parameter-sniffing-problem/

    Then this post is the second part

    http://www.benjaminnevarez.com/2010/06/how-optimize-for-unknown-works/

    And feel free to post any additional question you may have,

    Thanks,

    Ben

  8. […] to parameter sniffing. What to do next? I have a few recommendations on previous posts here and here. I have another one here but usually you should not be doing […]

  9. […] it’s a batch, a stored procedure, a user defined function or a trigger, the optimizer doesn’t know the values of local variables during query […]

  10. […] to get a measure of how the values are distributed in the index– this is called the “density vector” — read more about it here in a great post by Benjamin Nevarez. It then multipled the “density vector” by the number of rows in the table to create a […]

  11. […] to get a measure of how the values are distributed in the index– this is called the “density vector” — read more about it here in a great post by Benjamin Nevarez. It then multipled the “density vector” by the number of rows in the table to create a […]

  12. DBAWaffle says:

    Excellent explanation :D

    Do you think it would ever be worth wild having the optimize for unknown enabled using a switch – so something like interrogating the query parameter and ascertaining what would be the best way to direct the query optimiser.

  13. chandrasekar says:

    i have 30 Sp’s on my DB, only one sp causes parameter sniffing issue, after i fixed it with the use of local variables, then it started working fine, is it good to have “OPTION (OPTIMIZE FOR UNKNOWN)” in all SP’s or only we have to implement it when ever the sp comes under parameter sniffing issue, Please clarify my doubt.

  14. Banana says:

    @chandrasekar: Yes. Microsoft provided plan caches, parameter sniffing etc. just to add code and bloat up SQL Server, so it’s a good idea to override all that by stuffing Optimise for unknown into every single SP

Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>