Benjamin Nevarez Rotating Header Image

Parameter Sniffing and Plan-reuse-affecting SET Options

One interesting problem I am asked to troubleshoot sometimes is when a developer tells me that a stored procedure is timing out or taking too long to execute on a web application but returning immediately when executed directly in Management Studio. Even for the same parameters. Although there could be a few reasons for a problem like this to happen, including blocking, the most frequent one is related to a behavior in which the plan used by the web application was optimized using a combination of parameters which produced a “bad” plan for some other executions of the same stored procedure with different parameters. Although you may be tempted to just run sp_recompile to force a new optimization and allow the application to continue working, this does not really fix the problem and it may eventually come back. You could have also seen some similar scenarios where you have updated statistics, rebuild an index or changed something else to find out that suddenly the problem seems to be fixed. It is not. Those changes probably just forced a new optimization with the “good” parameter you were just testing. Obviously the best thing to do for this kind of problem is capturing the “bad” plan for further analysis in order to provide a permanent solution. In this post I will show you how to do that.

But first, a little bit of background. Remember that in general query optimization is an expensive operation and, in order to avoid this optimization cost, the plan cache will try to keep the generated execution plans in memory so they can be reused. So, if the stored procedure is executed thousands of times, only one optimization is needed. However, if a new connection running the same stored procedure has different SET options it may generate a new plan instead of reusing the one already on the plan cache. This new plan can be reused by later executions of the same stored procedure with the same connection settings. A new plan is needed as these SET options can impact the choice of an execution plan because they affect the results of evaluating constant expressions during the optimization process (a process known as constant folding and explained here). Another connection setting, FORCEPLAN, acts in a similar way to a hint, requesting the Query Optimizer both to preserve the join order as specified on the query syntax and to use nested loop joins only. As indicated in the Microsoft white paper Plan Caching in SQL Server 2008, the following SET options will affect the reuse of execution plans.


Unfortunately different management or development tools, like Management Studio, ADO.NET, or even sqlcmd, may have different SET options in their default configuration. You will find that mostly the problem is that one of the options, ARITHABORT, is OFF in ADO.NET and ON in Management Studio. So it may be possible that, in our example, Management Studio and the web application are using distinct cached plans and that the web application initially got a good plan for the parameters used during the optimization, but this plan is not good for some other executions of the same stored procedure with different parameters.

But now let us see how to prove that parameter sniffing is in fact the problem for your specific instance of the issue, and how to extract the plans to inspect both the parameters and the SET options used during optimization. Since AdventureWorks does not have the default SET options of a new database, let us create our own and copy some data from AdventureWorks


Create a new table and a stored procedure to test

USE Test
SELECT * INTO dbo.SalesOrderDetail
FROM AdventureWorks.Sales.SalesOrderDetail
ON dbo.SalesOrderDetail(ProductID)
CREATE PROCEDURE test (@pid int)
SELECT * FROM dbo.SalesOrderDetail
WHERE ProductID = @pid

Let us test two different applications, executing the stored procedure from a .NET application (C# code included at the end) and from Management Studio. For the purpose of this test we want to assume that a plan with a table scan is a bad plan and a plan using an index seek/RID lookup is the optimal one.

Start with a clean plan cache by running


Run the .NET application from a command prompt window and provide the value 870 as a parameter (note that this application is only running the test stored procedure)

Enter ProductID: 870

At this moment we can start inspecting the plan cache to see the plans available in memory. Run the following script from the Test database (we will be running this script again later during this exercise)

SELECT plan_handle, usecounts, pvt.set_options
    SELECT plan_handle, usecounts, epa.attribute, epa.value 
    FROM sys.dm_exec_cached_plans 
        OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
    WHERE cacheobjtype = 'Compiled Plan') AS ecpa 
PIVOT (MAX(ecpa.value) FOR ecpa.attribute IN ("set_options", "objectid")) AS pvt
where pvt.objectid = object_id('dbo.test')

You should get an output similar to this

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    1            251

The output shows that we have one execution plan in the plan cache, it has been used once (as indicated by the usecounts value), and the set_options value, taken from the sys.dm_exec_plan_attributes DMF, is 251. Since this was the first execution of the stored procedure, it was optimized using the parameter 870 which in this case created a plan using a table scan (consider here a “bad” plan). Now run the application again using a parameter that returns only a few records and will benefit from an index seek/RID lookup plan:

Enter ProductID: 898

If you inspect the plan cache again you will notice that the plan has been used twice and unfortunately this time it was not good for the second parameter used

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    2            251

At this moment the developer may try to troubleshoot this problem by running the stored procedure in Management Studio using something like this

EXEC test @pid = 898

Now the developer is surprised to find that SQL Server is returning a good execution plan and the query is returning immediately. Inspecting the plan cache again will show something similar to this

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    2            251
0x05000700210F0207B860650B000000000000000000000000    1            4347

You can see that a new plan was added for the Management Studio execution, with a different value for set_options.

What to do next? It is time to inspect the plans and look at the SET options and parameters used during the optimization. Select the plan_handle of the first plan created (the one with set_options 251 in your own example) and use it to run the following query

select * from sys.dm_exec_query_plan

You can find the SET options at the beginning of the plan

<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" 

And the used parameters at the end

    <ColumnReference Column="@pid" ParameterCompiledValue="(870)" />

Do the same for the second plan and you will get the following information for the SET options

<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="true" 

and the following parameter information

    <ColumnReference Column="@pid" ParameterCompiledValue="(898)" />

This information shows that the ARITHABORT SET option has different value on these plans and that the parameter used to optimize the query on the web application was 870. (The same information is available from the Properties window of a graphical plan). You can also verify the operators used in the plan, the first one using a table scan and the second one an index seek/RID lookup combination.

Now that you have captured the plans you can force a new optimization so the application can use a better plan immediately (keeping in mind that this is not a permanent solution). Try this

sp_recompile test

So now you know that you have a problem related to parameter sniffing. What to do next? I have a few recommendations on previous posts here and here. I have another one here but usually you should not be doing this.

Finally, you can use the following script to display SET options for a specific set_options value

declare @set_options int = 251
if ((1 & @set_options) = 1) print 'ANSI_PADDING'
if ((4 & @set_options) = 4) print 'FORCEPLAN'
if ((8 & @set_options) = 8) print 'CONCAT_NULL_YIELDS_NULL'
if ((16 & @set_options) = 16) print 'ANSI_WARNINGS'
if ((32 & @set_options) = 32) print 'ANSI_NULLS'
if ((64 & @set_options) = 64) print 'QUOTED_IDENTIFIER'
if ((128 & @set_options) = 128) print 'ANSI_NULL_DFLT_ON'
if ((256 & @set_options) = 256) print 'ANSI_NULL_DFLT_OFF'
if ((512 & @set_options) = 512) print 'NoBrowseTable'
if ((4096 & @set_options) = 4096) print 'ARITH_ABORT'
if ((8192 & @set_options) = 8192) print 'NUMERIC_ROUNDABORT'
if ((16384 & @set_options) = 16384) print 'DATEFIRST'
if ((32768 & @set_options) = 32768) print 'DATEFORMAT'
if ((65536 & @set_options) = 65536) print 'LanguageID'

C# Code

using System;
using System.Data;
using System.Data.SqlClient;

class Test
    static void Main()
        SqlConnection cnn = null;
        SqlDataReader reader = null;
            Console.Write("Enter ProductID: ");
            string pid = Console.ReadLine();

            cnn = new SqlConnection("Data Source=(local);Initial Catalog=Test;
                Integrated Security=SSPI");
            SqlCommand cmd = new SqlCommand();
            cmd.Connection = cnn;
            cmd.CommandText = "dbo.test";
            cmd.CommandType = CommandType.StoredProcedure;
            cmd.Parameters.Add("@pid", SqlDbType.Int).Value = pid;
            reader = cmd.ExecuteReader();
            while (reader.Read())
        catch (Exception e)
            throw e;
            if (cnn != null)
                if (cnn.State != ConnectionState.Closed)

Speaking at the PASS Summit and other Southern California events

I am currently working on the two sessions that I will be presenting at the PASS Summit: Inside the SQL Server Query Optimizer and Parameter Sniffing: the Query Optimizer vs. the Plan Cache. In addition, I will be presenting these two new sessions in other SQL Server events in Southern California including SQLSatuday #95.

First, I will be speaking at the Los Angeles SQL Server Professionals Group on Thursday September 15th. The meeting will be hosted at the UCLA Anderson School of Management and will start at 6:30 PM. I will present only one session, Inside the SQL Server Query Optimizer, in this meeting. You can find additional information about the meeting and directions on their website.

Two days later, on September 17th, I will be speaking at SQLSaturday #95 in San Diego, CA. Of course, this SQLSaturday will also have many other great speakers and the final schedule is already posted here. In addition to presenting both of my sessions described before I will be participating in the Ask the Experts – SQL Server Q&A session coordinated by Thomas Mueller. For more details and directions for SQLSaturday #95 please go to their website here.

On October 7th I will be presenting my Query Optimizer session at the Orange County SQL Server Professionals User Group in Mission Viejo, CA. Details and directions will be posted soon on their website here.

Then it is time for the PASS Summit, the largest SQL Server and BI conference in the world. The PASS Summit is hosted again this year in Seattle, WA and it is scheduled for October 11-14. The schedule for my two sessions is not final at the moment of writing this but so far it looks like I will be speaking on Wednesday and Friday.

I am flying back from the PASS Summit on Saturday and planning to present my two sessions at the SoCal Code Camp the following day, Sunday October 16th. The SoCal Code Camp is a community driven event for developers to come and learn from their peers. At this moment they are still accepting sessions so no schedule has been created yet. You can register, find additional information and directions on their website here.

Finally, although I am not going to be speaking, I will be attending SQL in the City in Los Angeles, CA on October 28th. SQL in the City is a one day SQL Server training event which will include several SQL Server MVPs and you can look at their site here for more details and information.

I look forward to meeting lots of SQL Server professionals on these events.

Code from my book Inside the SQL Server Query Optimizer

Recently I’ve been requested the code of my book Inside the Server Query Optimizer so I am including it in this post. The book contains a large number of example SQL queries, all of which are based on the AdventureWorks database and Chapter 6 additionally uses the AdventureWorksDW database. All code has been tested on both SQL Server 2008 and SQL Server 2008 R2. Note that these sample databases are not included by default in your SQL Server installation, but can be downloaded from the CodePlex website.

Inside the SQL Server Query Optimizer code – InsideQueryOptimizerCode.txt

Query Optimization with Denali Columnstore Indexes

In a previous post I talked about the new columnstore indexes and their related processing algorithms which are available in SQL Server code-named Denali. In this post I will cover the query processing part of the technology in more detail and will show you some examples that you can test on the recently released CTP3 (Community Technology Preview) of the product.

Same as with previous versions of SQL Server, in Denali the query optimizer can choose between the available access methods, which now also include columnstore indexes, and as always, this will be a cost-based decision. A new choice the query optimizer will have to make is the selection of an execution mode. The new query processing algorithms mentioned in my previous post will run in what is called a batch execution mode, which is different from the traditional processing mode, now called row mode.

In the row execution mode operators process data one row at a time. The new batch execution mode process data in batches which is more efficient for large amounts of data, like the workloads present on data warehouse queries. Each operator in an execution plan can use the row execution mode and, when columnstore indexes are available, some operators can also use the batch mode. There is both an estimated and an actual execution mode and this information is displayed on the query execution plan as I will show later. It is also worth mentioning that, although columnstore indexes can speed up the performance of data warehouse queries, they are not a good choice for very selective queries returning only a few records. In this case the query optimizer may have to rely on row stores, like clustered or regular nonclustered indexes, to find those records quickly. There are no seeks on columnstore indexes.

Same as with previous versions of SQL Server, you still have the choice to use a hint to force any index in the cases where the query optimizer is not giving you a good execution plan. This can happen for example when the query optimizer is choosing a columnstore index when it shouldn’t or when you want to force a columnstore index when it is not being selected. You can also use the new IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint to ask the query optimizer to avoid using any columnstore index.

Let me show you an example which you can test on SQL Server Denali CTP3, currently available for download here. To follow this example you will also need the AdventureWorksDWDenali database, available at CodePlex and I will use the same example on BOL to skip the basics and go directly to analyze the batch processing mode (By the way the BOL example didn’t work directly with the AdventureWorksDWDenali database so I had to add a few more columns at the end of the CREATE TABLE statement.)

First, use the following BOL code to create a partition function, a partition scheme and a new partitioned table with a columnstore index

USE AdventureWorksDWDenali;

    20050701, 20050801, 20050901, 20051001, 20051101, 20051201, 
    20060101, 20060201, 20060301, 20060401, 20060501, 20060601, 
    20060701, 20060801, 20060901, 20061001, 20061101, 20061201, 
    20070101, 20070201, 20070301, 20070401, 20070501, 20070601, 
    20070701, 20070801, 20070901, 20071001, 20071101, 20071201, 
    20080101, 20080201, 20080301, 20080401, 20080501, 20080601, 
    20080701, 20080801, 20080901, 20081001, 20081101, 20081201

AS PARTITION [ByOrderDateMonthPF] 

-- Create a partitioned version of the FactResellerSales table
CREATE TABLE [dbo].[FactResellerSalesPtnd]( 
    [ProductKey] [int] NOT NULL, 
    [OrderDateKey] [int] NOT NULL, 
    [DueDateKey] [int] NOT NULL, 
    [ShipDateKey] [int] NOT NULL, 
    [ResellerKey] [int] NOT NULL, 
    [EmployeeKey] [int] NOT NULL, 
    [PromotionKey] [int] NOT NULL, 
    [CurrencyKey] [int] NOT NULL, 
    [SalesTerritoryKey] [int] NOT NULL, 
    [SalesOrderNumber] [nvarchar](20) NOT NULL, 
    [SalesOrderLineNumber] [tinyint] NOT NULL, 
    [RevisionNumber] [tinyint] NULL, 
    [OrderQuantity] [smallint] NULL, 
    [UnitPrice] [money] NULL, 
    [ExtendedAmount] [money] NULL, 
    [UnitPriceDiscountPct] [float] NULL, 
    [DiscountAmount] [float] NULL, 
    [ProductStandardCost] [money] NULL, 
    [TotalProductCost] [money] NULL, 
    [SalesAmount] [money] NULL, 
    [TaxAmt] [money] NULL, 
    [Freight] [money] NULL, 
    [CarrierTrackingNumber] [nvarchar](25) NULL, 
    [CustomerPONumber] [nvarchar](25) NULL,
    [OrderDate] datetime NULL,
    [DueDate] datetime NULL,
    [ShipDate] datetime NULL
) ON ByOrderDateMonthRange(OrderDateKey);

-- Copy the data from the FactResellerSales into the new table
INSERT INTO dbo.FactResellerSalesPtnd WITH(TABLOCK)
SELECT * FROM dbo.FactResellerSales;

-- Create the columnstore index
ON [FactResellerSalesPtnd]

Now run the following query

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory
FROM FactResellerSalesPtnd
GROUP BY SalesTerritoryKey;

This will create the following plan where you can see the new Columnstore Index Scan operator


The properties of the Columnstore Index Scan operator are shown next


You may notice that the actual and estimated execution mode is Row (lines 3 and 4 on the list of properties). Row execution mode was selected because the table is not large enough to require the batch execution mode. We can use the undocumented ROWCOUNT and PAGECOUNT options of the UPDATE STATISTICS statement to simulate a larger table as shown next (for more information about how this works see my post about the DTA here)

UPDATE STATISTICS FactResellerSalesPtnd WITH ROWCOUNT = 10000000, PAGECOUNT = 1000000

Removing the existing plan (using for example DBCC FREEPROCCACHE) and running the same query again will now show the following plan (only part is shown), which this time is using parallelism.


In addition, by looking at the properties of the Columnstore Index Scan you can notice that this time it is using the batch execution mode


You can also use the new IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint to disallow the use of a columnstore index. Run the following code

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory
FROM FactResellerSalesPtnd
GROUP BY SalesTerritoryKey

This will show you the following plan which as you can see it is now directly using the FactResellerSalesPtnd table, without using the columnstore index.


Finally, since the number of records and pages of the FactResellerSalesPtnd table was altered for this test, perhaps you want drop it and create a new copy if you need to do some additional testing

DROP TABLE FactResellerSalesPtnd

Speaking at the PASS Summit 2011

I am honored to be speaking at the PASS Summit again this year. I’ve been attending this SQL Server conference every year since 2003 and this will be my fourth year speaking. Same as last year I will again be presenting two sessions.

On my first session, Inside the SQL Server Query Optimizer, I will go into the internals of the Query Optimizer and will show you the steps that it performs in the background covering everything from the time a query is submitted to SQL Server until an execution plan is generated. On my second session, Parameter Sniffing: the Query Optimizer vs. the Plan Cache, I will show you how the Query Optimizer uses parameter sniffing to produce a plan tailored to the current parameters of a query and why in some cases it could be a performance problem, including troubleshooting and solutions to these cases.

The PASS Summit is less than four months away and you can register here. I look forward to meeting lots of SQL Server professionals, including those whom I only know via twitter. See you in Seattle in October.

Statistics on Computed Columns

Another interesting topic that I usually talk about on my presentations is statistics on computed columns so I will use this post to show you how they work and how they can help you to improve the performance of your queries.

A problem faced by some queries using scalar expressions is that they usually cannot benefit from statistics and, without them, the Query Optimizer will use the 30% selectivity guess on inequality comparisons. A solution to this problem can be the use of computed columns, as SQL Server can automatically create and update statistics on these columns which can help the Query Optimizer to create better execution plans. An additional benefit of this solution is that you don’t need to specify the name of the computed column in your queries for SQL Server to use its statistics. The Query Optimizer automatically matches the computed column definition to an existing scalar expression in a query, so your applications do not need to be changed. Although computed columns have been available in previous versions of SQL Server, the automatic matching feature was only introduced with SQL Server 2005.

To see an example, run this query, which creates the plan shown next:

SELECT * FROM Sales.SalesOrderDetail
WHERE OrderQty * UnitPrice > 25000

clip_image002The estimated number of rows is 36,395.1, which is 30% of the total number of rows, 121,317, although the query returns only 5 records. SQL Server is obviously using a selectivity guess, as it cannot estimate the selectivity of the expression OrderQty * UnitPrice > 25000.

Now create a computed column:

ALTER TABLE Sales.SalesOrderDetail
ADD cc AS OrderQty * UnitPrice

Run the previous SELECT statement again and note that, this time, the estimated number of rows has changed to 84.3101 which is very close to the actual number of rows returned by the query, as shown in the following plan:

clip_image004You can optionally test replacing the 25,000 in the query with some other values, like 1,000, 10,000, or 20,000 and verify that the estimated again will be close to the actual number of rows returned.

Note that creating the computed column does not create statistics; these statistics are created the first time that the query is optimized, and you can run the next query to display the information about the statistics objects for the Sales.SalesOrderDetail table:

SELECT * FROM sys.stats
WHERE object_id = object_id('Sales.SalesOrderDetail')

The newly created statistics object will most likely be at the end of the list. Copy its name and use the following command to display the details about the statistics object (I’ve used the name of my local object, but you should replace that as appropriate). You can also use "cc" as the name of the object to get the same results. In both cases, the "cc" column should be shown on the Columns field in the density section.

DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail', _WA_Sys_0000000C_2645B050)

Unfortunately, for the automatic matching feature to work, the expression must be exactly the same as the computed column definition. So, if I change the query to UnitPrice * OrderQty, instead of OrderQty * UnitPrice, the execution plan will show an estimated number of rows of 30% again, as this query will demonstrate:

SELECT * FROM Sales.SalesOrderDetail
WHERE UnitPrice * OrderQty > 25000

As mentioned, the computed column provides statistics so the Query Optimizer can try to get you a better execution plan. In addition, you can create an index on the existing computed column to provide a better navigational alternative. Create the following index

CREATE INDEX IX_cc on Sales.SalesOrderDetail(cc)

By running the original SELECT statement again the Query Optimizer will now choose the newly created index and will produce a more efficient plan using an Index Seek/Key Lookup instead of a Clustered Index Scan, as shown next.

clip_image006Finally, drop the index and computed column you’ve just created:

DROP INDEX Sales.SalesOrderDetail.IX_cc
ALTER TABLE Sales.SalesOrderDetail

The Query Optimizer and Contradiction Detection

As covered in my book Inside the SQL Server Query Optimizer, contradiction detection is a query rewrite (or tree rewrite?) performed at the simplification phase of the optimization process in which query contradictions are detected and removed. Since these parts of the query are not executed at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. For example, the Query Optimizer may know that no records can satisfy a predicate even before touching any page of data. A contradiction may be related to a check constraint, or may be related to the way the query is written. I will show you examples of both cases next.

First, I need to find a table with a check constraint in AdventureWorks and, handily, the Employee table has the following check constraint definition:

([VacationHours]>=(-40) AND [VacationHours]<=(240))

This check constraint makes sure that the number of vacation hours is a number between –40 and 240, so when I request

SELECT * FROM HumanResources.Employee
WHERE VacationHours > 80

… SQL Server uses a Clustered Index Scan operator, as shown next

clip_image002However, if I request all of the employees with more than 300 vacation hours then, because of this check constraint, the Query Optimizer must immediately know that no records qualify for predicate. Run the following code

SELECT * FROM HumanResources.Employee
WHERE VacationHours > 300

As expected, the query will return no records, but this time it will show the following execution planclip_image004Note that, this time, instead of a Clustered Index Scan, SQL Server is using a Constant Scan operator. Since there is no need to access the table at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. Now, let’s see what happens if I disable the check constraint

ALTER TABLE HumanResources.Employee NOCHECK CONSTRAINT CK_Employee_VacationHours

This time, running the last query once again uses a Clustered Index Scan operator, as the Query Optimizer can no longer use the check constraint to guide its decisions. Don’t forget to enable the constraint again by running the following statement:


The second type of contradiction case is when the query itself explicitly contains a contradiction. Take a look at the next query

SELECT * FROM HumanResources.Employee
WHERE ManagerID > 10 AND ManagerID < 5

In this case there is no check constraint involved; both predicates are valid and each will individually return records, but they contradict each other when they are run together. As a result, the query returns no records and the plan shows again a Constant Scan operator similar to the plan shown previously. This may just look like a badly written query, but remember that some predicates may already be included in, for example, view definitions, and the developer of the query may be unaware of those. For example, in our last query, a view may include the predicate ManagerID > 10 and a developer may call this view using the predicate ManagerID < 5. Since both predicates contradict each other a Constant Scan operator will be used again instead.

Database Engine Tuning Advisor and the Query Optimizer – Part 2

One of the most interesting and perhaps not well known features of the Database Engine Tuning Advisor (DTA) is that you can use it with a test server to tune the workload of a production server. As I mentioned on the first part of this post, the DTA relies on the Query Optimizer to make its tuning recommendations and you can use it to make these optimizer calls to a test server instance without impacting the performance of the production server.

Information Required by the Query Optimizer

To better understand how this works let us first review what kind of information the Query Optimizer needs to tune a workload. Basically the most important information it needs to perform an optimization is:

1) The database metadata (i.e. table and column definitions, indexes, constraints, etc.)

2) Optimizer statistics (index and column statistics)

3) Table size (number of rows and pages)

4) Available memory and number of processors

The DTA can gather the database metadata and statistics from the production server and use it to create a similar database, with no data, on a different server. This is called a shell database. The DTA can also obtain the available memory and number of processors on the production server, by using the extended stored procedure xp_msver, and use this information for the optimization process. It is important to remind that no data is needed for the optimization process. This process is summarized in the following figure taken from Books Online: clip_image001[8]

This process provides the following benefits:

1) There is no need to do an expensive optimization on the production server which can impact its resources usage. Production server is only used to gather initial metadata and the required statistics.

2) No need to copy the entire database to a test server either, which is especially important for big databases, saving disk space and time to copy the database

3) No problems where test servers are not as powerful as production server as the DTA tuning session will consider the available memory and number of processors of the production server.

Running a Tuning Session

Now I am going to show an example of how to run a tuning session. First of all, the use of a test server is not supported by the DTA graphical user interface so the use of the dta utility, the command prompt version of DTA, is required. Configuring a test server also requires an XML input file containing the dta input information. I am using the following input file for this example

<?xml version="1.0" encoding="utf-16" ?>
<DTAXML xmlns:xsi="" 

The Server and Database elements of the XML file include the production SQL Server instance and database. The Workload element includes the definition of a script containing the workload to tune. TuningOptions includes the TestServer subelement which is used to include the name of the test SQL Server instance.

Create the workload.sql file containing a simple query like this

SELECT * FROM AdventureWorks.Sales.SalesOrderDetail
WHERE ProductID = 898

Run the following command

dta -ix input.xml -S production_instance -s session1

A successful execution will show an output similar to this

Microsoft (R) SQL Server Microsoft SQL Server Database Engine Tuning Advisor com
mand line utility
Version 9.00.5000.00
Copyright (c) Microsoft Corporation. All rights reserved.

Tuning session successfully created. Session ID is 26.

Total time used: 00:00:03
Workload consumed: 100%, Estimated improvement: 96%

Tuning process finished.

This example creates an entire copy of AdventureWorks (with no data) and performs the requested optimization. The shell database is automatically deleted after the tuning session is completed. Optionally you can keep the shell database, for example if you want to use it again on another tuning exercise, by using the RetainShellDB in the TuningOptions element like in the following XML fragment.


If the shell database already exists when you request a tuning session, the database creation process will be skipped. However, you will have to manually delete this database when it is no longer needed.

Once the tuning session is completed you can use the DTA graphical user interface as usual to see the recommendations. To do this open the DTA, open the session you used by double-clicking its session name (session1 in our example) and chose the Recommendations tab if it is not already selected.

Scripting Statistics

Although the DTA automatically gathers the metadata and statistics to build the shell database, I am going to show you how to script the required objects and statistics to tune a simple query. This can be helpful in cases where you don’t want to script the entire database. Scripting database objects is a fairly simple process well known by SQL Server professionals. Something that may be new for many though, is how to script the statistics. Created scripts make use of the undocumented STATS_STREAM, ROWCOUNT and PAGECOUNT options of the CREATE/UPDATE STATISTICS statement.

As an example to optimize the simple query shown previously try the following on Management Studio: Select Databases, right-click the AdventureWorks database, select Tasks, Generate Scripts …, click Next, select “Select specific database objects”, expand Tables, select Sales.SalesOrderDetail, click Next, click Advanced, look for the “Script Statistics” choice and select “Script statistics and histograms”. Finally chose True on “Script Indexes”. Your Advanced Scripting Options window should look similar to this:


Click Ok and finish the wizard to generate the scripts. You will get a script with a few UPDATE STATISTICS statements similar to this (with the STAT_STREAM value shortened to fit in this page).

UPDATE STATISTICS [Sales].[SalesOrderDetail]([IX_SalesOrderDetail_ProductID]) 
WITH STATS_STREAM = 0x010000000300000000000000000000004036000 , 
ROWCOUNT = 121317, PAGECOUNT = 227

These UPDATE STATISTICS statements are used to update the statistics of existing indexes (obviously the related CREATE INDEX statements were scripted as well). If the table also has column statistics it will include CREATE STATISTICS statements instead.

Testing Scripted Statistics

Finally, I will show you an example of how to use the scripted statistics to obtain plans and cost estimates on an empty table. Running the following query on the regular AdventureWorks database creates the following plan with an estimated number of rows of 9 and a cost of 0.0296835.

SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 898


Let us produce the same plan on an empty database. Following the procedure described before you can script the Sales.SalesOrderDetail table. You will end with multiple statements including the following (again shortened to fit on this post).

CREATE TABLE [Sales].[SalesOrderDetail](
    [SalesOrderID] [int] NOT NULL, 
    [ProductID] ASC
UPDATE STATISTICS [Sales].[SalesOrderDetail]([IX_SalesOrderDetail_ProductID]) 
WITH STATS_STREAM = 0x010000000300000000000, ROWCOUNT = 121317, PAGECOUNT = 227
UPDATE STATISTICS [Sales].[SalesOrderDetail]
WITH STATS_STREAM = 0x010000000200000000000000000000003C2F68F6, ROWCOUNT = 121317, 

Create a new database and run at least the previous four statements using the scripts you got on the previous step (or you can use the attached script on this post containing the statements needed to reproduce the example). After implementing the script on an empty database and running the sample query, you will get again the plan with cost 0.0296835 and estimated number of rows of 9.

My PASS Summit 2011 Submissions


I’ve been attending the PASS Summit every year since 2003 and been fortunate enough to be speaking there a few times too. Of course I am planning to attend again this year and I’ve also submitted a few sessions this time.

Something new this year though, is the fact that the session selection has been made open to the SQL Server community and we all can vote for the sessions we would like to see at the Summit. Voting closes this Friday, May 20th and we can choose from any of the 646 sessions submitted. So, if you like any of my sessions, which are described next, please vote for them here. I look forward to seeing you at the PASS Summit in October.

My submitted abstracts, all of which are for regular sessions of 75 minutes, are:

Inside the SQL Server Query Optimizer [400]

The SQL Server Query Optimizer is a cost-based optimizer: it analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans, and selects the plan with the lowest cost. In this session I will go into the internals of the Query Optimizer and will show you the steps that it performs in the background covering everything from the time a query is submitted to SQL Server until an execution plan is generated. I’ll show you how the Query Optimizer generates possible alternative execution plans, how these plans are stored for the duration of the optimization process, how heuristics are used to limit the number of alternative plans considered, how each candidate plan is also costed, and finally how the best alternative is chosen based on those costs. I will also cover why query optimization is an inherently complex problem and why challenges in some of its most fundamental areas are still being addressed today.

Query Optimizer Statistics for Better Performance [300]

The SQL Server Query Optimizer is a cost-based optimizer: the quality of the execution plans it generates is directly related to the accuracy of its cost estimations. In the same way, the estimated cost of a plan is based on the query’s cardinality estimation and the algorithms or operators it uses. In this session I will show you how the Query Optimizer uses statistics to estimate this cardinality and why having good quality statistics is extremely important for the performance of your queries. I will also cover how to diagnose and troubleshoot cardinality estimation problems in the cases when you are not getting a good execution plan and will provide solutions to these problems. Finally, existing challenges regarding statistics still faced by query optimizers today will also be discussed.

Partitioned Tables and Indexes: Management and Query Processing [300]

You’ve decided to implement partitioning because you want easier management of very large tables and indexes and want to improve your data loading, deletion and archival operations. But you also need to understand the query processing implications and how partitioning will impact your existing queries. On the other side, many SQL Server users still believe that the primary purpose of partitioning is to make your queries run faster by querying only a specific partition. In this session I will show you the reality and will focus on both how to implement partitioning to achieve better maintenance and data availability and how it will impact query processing, including benefits, potential issues, and recommendations.

Parameter Sniffing: the Query Optimizer vs. the Plan Cache [300]

Parameter sniffing is a good thing: it is used by the Query Optimizer to produce an execution plan tailored to the current parameters of a query. However, due to the way that the plan cache stores these plans in memory, sometimes can also be a performance problem. This session will show you how parameter sniffing works and in which cases could be a problem. How to diagnose and troubleshoot parameter sniffing problems and their solutions will be discussed as well. The session will also include details on how the Query Optimizer uses the histogram and density components of the statistics object and some other advanced topics.

Database Engine Tuning Advisor and the Query Optimizer

Since I am planning to publish a couple of DTA-related posts later this week I thought it would be a good idea to start with this article which explains how the DTA relies on the Query Optimizer to make its tuning recommendations. This article was previously published in my book Inside the SQL Server Query Optimizer.
Currently, all major commercial database vendors include a physical database design tool to help with the creation of indexes. However, when these tools were first developed, there were just two main architectural approaches considered for how these tools should recommend indexes. The first approach was to build a stand-alone tool with its own cost model and design rules. The second approach was to build a tool that could use the Query Optimizer cost model.

A problem with building a stand-alone tool is the requirement for duplicating the cost module. On top of that, having a tool with its own cost model, even if it’s better than the optimizer’s cost model, may not be a good idea because the optimizer still chooses its plan based on its own model.

The second approach, using the Query Optimizer to help on physical database design, was proposed in the database research community as far as back as 1988. Since it’s the optimizer which chooses the indexes for an execution plan, it makes sense to use the optimizer itself to help find which missing indexes would benefit existing queries. In this scenario, the physical design tool would use the optimizer to evaluate the cost of queries given a set of candidate indexes. An additional benefit of this approach is that, as the optimizer cost model evolves, any tool using its cost model can automatically benefit from it.

SQL Server was the first commercial database product to include a physical design tool, in the shape of the Index Tuning Wizard which shipped with SQL Server 7.0, and which was later replaced by the Database Engine Tuning Advisor (DTA) in SQL Server 2005. Both tools use the Query Optimizer cost model approach and were created as part of the AutoAdmin project at Microsoft, the goal of which was to reduce the total cost of ownership (TCO) of databases by making them self-tuning and self-managing. In addition to indexes, the DTA can help with the creation of indexed views and table partitioning.

However, creating real indexes in a DTA tuning session is not feasible; its overhead could impact operational queries and degrade the performance of your database. So how does the DTA estimate the cost of using an index that does not yet exist? Actually, even during a regular query optimization, the Query Optimizer does not use indexes to estimate the cost of a query. The decision of whether to use an index or not relies only on some metadata and the statistical information regarding the columns of the index. Index data itself is not needed during query optimization but will, of course, be required during query execution if the index is chosen for the execution plan.

So, to avoid creating indexes during a DTA session, SQL Server uses a special kind of indexes called hypothetical indexes, which were also used by the Index Tuning Wizard. As the name implies, hypothetical indexes are not real indexes; they only contain statistics and can be created with the undocumented WITH STATISTICS_ONLY option of the CREATE INDEX statement. You may not be able to see these indexes during a DTA session because they are dropped automatically when they are no longer needed. However, you could see the CREATE INDEX WITH STATISTICS_ONLY and DROP INDEX statements if you run a SQL Server Profiler session to see what the DTA is doing.

Let’s take a quick tour to some of these concepts. To get started, create a new table on the AdventureWorks database:

INTO dbo.SalesOrderDetail
FROM Sales.SalesOrderDetail

Copy the following query and save it to a file:

SELECT * FROM dbo.SalesOrderDetail
WHERE ProductID = 897

Open a new DTA session, and you can optionally run a SQL Server Profiler session if you want to inspect what the DTA is doing. On the Workload File option, select the file containing the SQL statement that you just created and specify AdventureWorks as both the database to tune and the database for workload analysis. Click the Start Analysis button and, when the DTA analysis finishes, run this query to inspect the contents of the msdb..DTA_reports_query table:

SELECT * FROM msdb..DTA_reports_query

Running that query shows the following output, (edited for space):

StatementString                             CurrentCost RecommendedCost
SELECT * FROM dbo.SalesOrderDetail WHERE Pr 1.2434      0.00328799

Notice that the query returns information like the query that was tuned, as well as the current and recommended cost. The current cost, 1.2434, is easy to obtain by directly requesting an estimated execution plan for the query as shown next.


Since the DTA analysis was completed, the created hypothetical indexes were already dropped. To now obtain the indexes recommended by the DTA, click on the Recommendations tab and look at the Index Recommendations section, where you can find the code to create any recommended index by then clicking on the Definition column. In our example, it will show the following code:

CREATE CLUSTERED INDEX [_dta_index_SalesOrderDetail_c_5_1915153868__K5]
ON [dbo].[SalesOrderDetail]
    [ProductID] ASC

In the next statement, and for demonstration purposes only, I will go ahead and create the index recommended by the DTA but, instead of a regular index, I will create it as a hypothetical index by adding the WITH STATISTICS_ONLY clause:

CREATE CLUSTERED INDEX cix_ProductID ON dbo.SalesOrderDetail(ProductID)

You can validate that a hypothetical index was created by running the next query:

SELECT * FROM sys.indexes
WHERE object_id = object_id('dbo.SalesOrderDetail')
AND name = 'cix_ProductID'

The output is shown next below; note that the is_hypothetical field shows that this is, in fact, just a hypothetical index:

object_id  name          index_id type type_desc is_hypothetical
1915153868 cix_ProductID 3        1    CLUSTERED 1

Remove the hypothetical index by running this statement:

DROP INDEX dbo.SalesOrderDetail.cix_ProductID

Now implement the DTA recommendation, this time as a regular clustered index:

CREATE CLUSTERED INDEX cix_ProductID ON dbo.SalesOrderDetail(ProductID)

After implementing the recommendation and running the query again, the clustered index is in fact now being chosen by the Query Optimizer. This time, the plan shows a Clustered Index Seek operator and an estimated cost of 0.0033652, which is very close to the recommended cost listed previously when querying the msdb..DTA_reports_query table.

Finally, drop the table you just created by running the following statement:

DROP TABLE dbo.SalesOrderDetail