Benjamin Nevarez Rotating Header Image

PASS Summit 2011 Recap

This is my late review of the PASS Summit 2011, which was hosted once again at the Washington State Convention & Trade Center in Seattle a couple of weeks ago. The PASS Summit is the largest SQL Server event in the world and this year was scheduled to run 189 sessions with 204 speakers from all over the world. In addition to being my ninth year attending this conference I was also excited that this was my fourth year speaking there as well.

So the week started for me on Monday morning flying from Los Angeles to Seattle. Later in the afternoon I went to the registration area where I started meeting a lot of people of the SQL Server community just to end the day with some SQL Karaoke at the Bush Garden. I was originally scheduled to attend Dr. David DeWitt’s pre-con “A Peek Inside an RDBMS” on Tuesday but unfortunately it was cancelled and I didn’t attend any other pre-con that day. The Welcome Reception was held on Tuesday night and included the traditional Quiz Bowl, which I show in the next picture. From the Welcome Reception some of us went to the Speakers and Volunteers Party at hosted The Garage.

clip_image002[3]

On Wednesday, the first keynote of the conference started with Rushabh Mehta, PASS President, and was followed by Ted Kummert, Microsoft Senior Vice President, Business Platform Division. Not a big surprise but Ted made it official that SQL Server code-named Denali will now be named SQL Server 2012 and will be released on the first half of the next calendar year. Among other things, Ted talked about Big Data and announced that Microsoft will be supporting Hadoop and it is planning to deliver Apache Hadoop-based distributions for both Windows Server and Windows Azure. He also mentioned that SQL Server and SQL Server Parallel Data Warehouse connectors for Apache Hadoop had being released just the previous week. By the way, all three keynotes of the conference were broadcasted live and you can still watch them on demand at the PASS website.

I had to leave the keynote early to find the room and prepare everything for my first session at the conference. My session, “Inside the SQL Server Query Optimizer”, was scheduled for a room with capacity for 520 people and next is a picture taken just a few minutes before I started presenting. I was even asked to sign a copy of my book just before my session :-)

clip_image004[3]

The rest of Wednesday after lunch I went to a couple of sessions. With 15 sessions running concurrently choosing which ones to attend was not an easy task. I attended Bob Ward’s half-day session “Inside Tempdb” and later went to Aaron Bertrand’s “What’s new in SQL Server code-named Denali – Engine and Tools”. Half-day sessions were new at the PASS Summit this year. On Wednesday night I went to the SolidQ Party, which I’ve also attended in previous years.

I was not presenting any session on Thursday so I just went to the keynote and spent the rest of the day attending sessions. The keynote included Bill Graziano, PASS Executive Vice President, Finance, and Quentin Clark, Microsoft Corporate Vice President. Quentin started talking about SQL Server 2012 and warned that would be impossible to talk about all the hundreds of new features and improvements of this new version so instead he went into explaining his 12 favorite areas of value, which he called the “Fantastic 12 of SQL Server 2012”. He spent the rest of the keynote talking about the most important new SQL Server 2012 features grouped into these 12 areas of value. After the keynote, my choice for session in the morning was Itzik Ben-Gan “Bug or Feature?”.

During lunch I was on the same table as Eric Hanson, Principal Program Manager Lead, Query Processing and Storage at Microsoft, so I used the opportunity to talk to him about the new columnstore indexes feature. After lunch I went to the session “Physical Join Operators” by Ami Levin, which was very entertaining. I continued with Rob Farley “Joins, Sargability and the Evils of Residualiciousness” to finally close the day with “SQLCAT: SQL Server HA and DR Design Architectures and Best Practices” with Sanjay Mishra, Justin Erickson and Mike Weiner.

I spent Thursday night mostly at the Community Appreciation Party, hosted again this year at GameWorks. I met many people there and ended in the same table with Lubor Kollar who mentioned that he is back to work with the SQL Server Core Engine development team, responsible for query optimization, query execution, and data warehousing.

The keynote on Friday started with Rick Heighes, PASS Vice President, Marketing, followed by David DeWitt, Microsoft Technical Fellow, Data and Storage Platform Division. Dr. DeWitt’s keynote, entitled "Big Data: What’s the Big Deal?", was one of the most anticipated sessions of the conference and focused on Hadoop and its ecosystem of software tools. He concluded his keynote saying that relational databases and Hadoop are designed to meet different needs and they can complement each other so database professionals need to make sure that both technologies work together the best they can. After the keynote I attended Adam Machanic “Query Tuning Mastery: Zend and the Art of Workspace Memory”.

Something amazing for me during the conference was seeing my book Inside the SQL Server Query Optimizer available at the PASS Bookstore. Next I am including a picture taken on Friday when there were only a few copies left :-) -It is the one with the beacon in the cover. I also saw my book at the Red Gate booth and, of course, I gave away a couple of copies at each of my sessions.

clip_image006[3]

Just after lunch on Friday I went to see Susan Price and Murshed Zaman presenting “Project Apollo: How to use Columnstore Indexes to Revolutionize Query Performance on your Data Warehouse”. Susan presented a similar session last year but I was not able to attend it as I was also presenting a session at the same time. I stayed in the same room to learn more about the columnstore indexes with Wayne Snyder’s session “Using Columnstore/Vertipaq indexes in SQL Server code-named Denali” but again I had to leave early as I was speaking next.

Then it was time for presenting my last session, “Parameter Sniffing: the Query Optimizer vs. the Plan Cache”. Since this was the last round of sessions at the conference I was wondering, same as other speakers I met at the Speaker Ready room, if nobody or very few people would show up. Fortunately the attendance for my session was good enough and the presentation went really well. Finally I spend Friday night with dinner at the Tap House with several people of the SQL Server community.

Something interesting I saw on Saturday afternoon while going for lunch to the Hard Rock Cafe was the people of the Occupy Seattle movement on the streets of Seattle. I took several pictures there, one is shown next.

clip_image008[3]

I flew back to Los Angeles on Saturday night and interestingly enough I was scheduled to speak again at the SoCal Code Camp at the University of Southern California the next day, where I presented the same two sessions I did at the PASS Summit.

In summary, this was another excellent PASS Summit and I can’t wait for the next one, which is already scheduled for Seattle in November 6-9, 2012. See you then.

Parameter Sniffing and Plan-reuse-affecting SET Options

One interesting problem I am asked to troubleshoot sometimes is when a developer tells me that a stored procedure is timing out or taking too long to execute on a web application but returning immediately when executed directly in Management Studio. Even for the same parameters. Although there could be a few reasons for a problem like this to happen, including blocking, the most frequent one is related to a behavior in which the plan used by the web application was optimized using a combination of parameters which produced a “bad” plan for some other executions of the same stored procedure with different parameters. Although you may be tempted to just run sp_recompile to force a new optimization and allow the application to continue working, this does not really fix the problem and it may eventually come back. You could have also seen some similar scenarios where you have updated statistics, rebuild an index or changed something else to find out that suddenly the problem seems to be fixed. It is not. Those changes probably just forced a new optimization with the “good” parameter you were just testing. Obviously the best thing to do for this kind of problem is capturing the “bad” plan for further analysis in order to provide a permanent solution. In this post I will show you how to do that.

But first, a little bit of background. Remember that in general query optimization is an expensive operation and, in order to avoid this optimization cost, the plan cache will try to keep the generated execution plans in memory so they can be reused. So, if the stored procedure is executed thousands of times, only one optimization is needed. However, if a new connection running the same stored procedure has different SET options it may generate a new plan instead of reusing the one already on the plan cache. This new plan can be reused by later executions of the same stored procedure with the same connection settings. A new plan is needed as these SET options can impact the choice of an execution plan because they affect the results of evaluating constant expressions during the optimization process (a process known as constant folding and explained here). Another connection setting, FORCEPLAN, acts in a similar way to a hint, requesting the Query Optimizer both to preserve the join order as specified on the query syntax and to use nested loop joins only. As indicated in the Microsoft white paper Plan Caching in SQL Server 2008, the following SET options will affect the reuse of execution plans.

ANSI_NULL_DFLT_OFF
ANSI_NULL_DFLT_ON
ANSI_NULLS
ANSI_PADDING
ANSI_WARNINGS
ARITHABORT
CONCAT_NULL_YIELDS_NULL
DATEFIRST
DATEFORMAT
FORCEPLAN
LANGUAGE
NO_BROWSETABLE
NUMERIC_ROUNDABORT
QUOTED_IDENTIFIER

Unfortunately different management or development tools, like Management Studio, ADO.NET, or even sqlcmd, may have different SET options in their default configuration. You will find that mostly the problem is that one of the options, ARITHABORT, is OFF in ADO.NET and ON in Management Studio. So it may be possible that, in our example, Management Studio and the web application are using distinct cached plans and that the web application initially got a good plan for the parameters used during the optimization, but this plan is not good for some other executions of the same stored procedure with different parameters.

But now let us see how to prove that parameter sniffing is in fact the problem for your specific instance of the issue, and how to extract the plans to inspect both the parameters and the SET options used during optimization. Since AdventureWorks does not have the default SET options of a new database, let us create our own and copy some data from AdventureWorks

CREATE DATABASE Test
GO

Create a new table and a stored procedure to test

USE Test
GO
SELECT * INTO dbo.SalesOrderDetail
FROM AdventureWorks.Sales.SalesOrderDetail
GO
CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_ProductID
ON dbo.SalesOrderDetail(ProductID)
GO
CREATE PROCEDURE test (@pid int)
AS
SELECT * FROM dbo.SalesOrderDetail
WHERE ProductID = @pid

Let us test two different applications, executing the stored procedure from a .NET application (C# code included at the end) and from Management Studio. For the purpose of this test we want to assume that a plan with a table scan is a bad plan and a plan using an index seek/RID lookup is the optimal one.

Start with a clean plan cache by running

DBCC FREEPROCCACHE

Run the .NET application from a command prompt window and provide the value 870 as a parameter (note that this application is only running the test stored procedure)

C:\TestApp\test
Enter ProductID: 870

At this moment we can start inspecting the plan cache to see the plans available in memory. Run the following script from the Test database (we will be running this script again later during this exercise)

SELECT plan_handle, usecounts, pvt.set_options
FROM (
    SELECT plan_handle, usecounts, epa.attribute, epa.value
    FROM sys.dm_exec_cached_plans
        OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
    WHERE cacheobjtype = 'Compiled Plan') AS ecpa
PIVOT (MAX(ecpa.value) FOR ecpa.attribute IN ("set_options", "objectid")) AS pvt
where pvt.objectid = object_id('dbo.test')

You should get an output similar to this

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    1            251

The output shows that we have one execution plan in the plan cache, it has been used once (as indicated by the usecounts value), and the set_options value, taken from the sys.dm_exec_plan_attributes DMF, is 251. Since this was the first execution of the stored procedure, it was optimized using the parameter 870 which in this case created a plan using a table scan (consider here a “bad” plan). Now run the application again using a parameter that returns only a few records and will benefit from an index seek/RID lookup plan:

C:\TestApp\test
Enter ProductID: 898

If you inspect the plan cache again you will notice that the plan has been used twice and unfortunately this time it was not good for the second parameter used

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    2            251

At this moment the developer may try to troubleshoot this problem by running the stored procedure in Management Studio using something like this

EXEC test @pid = 898

Now the developer is surprised to find that SQL Server is returning a good execution plan and the query is returning immediately. Inspecting the plan cache again will show something similar to this

plan_handle                                           usecounts    set_options
0x05000700210F0207B8C09007000000000000000000000000    2            251
0x05000700210F0207B860650B000000000000000000000000    1            4347

You can see that a new plan was added for the Management Studio execution, with a different value for set_options.

What to do next? It is time to inspect the plans and look at the SET options and parameters used during the optimization. Select the plan_handle of the first plan created (the one with set_options 251 in your own example) and use it to run the following query

select * from sys.dm_exec_query_plan
(0x05000700210F0207B8C09007000000000000000000000000)

You can find the SET options at the beginning of the plan

<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false"
CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true"
ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" />

And the used parameters at the end

<ParameterList>
    <ColumnReference Column="@pid" ParameterCompiledValue="(870)" />
</ParameterList>

Do the same for the second plan and you will get the following information for the SET options

<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="true"
CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true"
ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" />

and the following parameter information

<ParameterList>
    <ColumnReference Column="@pid" ParameterCompiledValue="(898)" />
</ParameterList>

This information shows that the ARITHABORT SET option has different value on these plans and that the parameter used to optimize the query on the web application was 870. (The same information is available from the Properties window of a graphical plan). You can also verify the operators used in the plan, the first one using a table scan and the second one an index seek/RID lookup combination.

Now that you have captured the plans you can force a new optimization so the application can use a better plan immediately (keeping in mind that this is not a permanent solution). Try this

sp_recompile test

So now you know that you have a problem related to parameter sniffing. What to do next? I have a few recommendations on previous posts here and here. I have another one here but usually you should not be doing this.

Finally, you can use the following script to display SET options for a specific set_options value

declare @set_options int = 251
if ((1 & @set_options) = 1) print 'ANSI_PADDING'
if ((4 & @set_options) = 4) print 'FORCEPLAN'
if ((8 & @set_options) = 8) print 'CONCAT_NULL_YIELDS_NULL'
if ((16 & @set_options) = 16) print 'ANSI_WARNINGS'
if ((32 & @set_options) = 32) print 'ANSI_NULLS'
if ((64 & @set_options) = 64) print 'QUOTED_IDENTIFIER'
if ((128 & @set_options) = 128) print 'ANSI_NULL_DFLT_ON'
if ((256 & @set_options) = 256) print 'ANSI_NULL_DFLT_OFF'
if ((512 & @set_options) = 512) print 'NoBrowseTable'
if ((4096 & @set_options) = 4096) print 'ARITH_ABORT'
if ((8192 & @set_options) = 8192) print 'NUMERIC_ROUNDABORT'
if ((16384 & @set_options) = 16384) print 'DATEFIRST'
if ((32768 & @set_options) = 32768) print 'DATEFORMAT'
if ((65536 & @set_options) = 65536) print 'LanguageID'

C# Code

using System;
using System.Data;
using System.Data.SqlClient;

class Test
{
    static void Main()
    {
        SqlConnection cnn = null;
        SqlDataReader reader = null;

        try
        {
            Console.Write("Enter ProductID: ");
            string pid = Console.ReadLine();

            cnn = new SqlConnection("Data Source=(local);Initial Catalog=Test;
                Integrated Security=SSPI");
            SqlCommand cmd = new SqlCommand();
            cmd.Connection = cnn;
            cmd.CommandText = "dbo.test";
            cmd.CommandType = CommandType.StoredProcedure;
            cmd.Parameters.Add("@pid", SqlDbType.Int).Value = pid;
            cnn.Open();
            reader = cmd.ExecuteReader();
            while (reader.Read())
            {
                Console.WriteLine(reader[0]);
            }
            return;
        }
        catch (Exception e)
        {
            throw e;
        }
        finally
        {
            if (cnn != null)
            {
                if (cnn.State != ConnectionState.Closed)
                    cnn.Close();
            }
        }
    }
}

Speaking at the PASS Summit and other Southern California events

I am currently working on the two sessions that I will be presenting at the PASS Summit: Inside the SQL Server Query Optimizer and Parameter Sniffing: the Query Optimizer vs. the Plan Cache. In addition, I will be presenting these two new sessions in other SQL Server events in Southern California including SQLSatuday #95.

First, I will be speaking at the Los Angeles SQL Server Professionals Group on Thursday September 15th. The meeting will be hosted at the UCLA Anderson School of Management and will start at 6:30 PM. I will present only one session, Inside the SQL Server Query Optimizer, in this meeting. You can find additional information about the meeting and directions on their website.

Two days later, on September 17th, I will be speaking at SQLSaturday #95 in San Diego, CA. Of course, this SQLSaturday will also have many other great speakers and the final schedule is already posted here. In addition to presenting both of my sessions described before I will be participating in the Ask the Experts – SQL Server Q&A session coordinated by Thomas Mueller. For more details and directions for SQLSaturday #95 please go to their website here.

On October 7th I will be presenting my Query Optimizer session at the Orange County SQL Server Professionals User Group in Mission Viejo, CA. Details and directions will be posted soon on their website here.

Then it is time for the PASS Summit, the largest SQL Server and BI conference in the world. The PASS Summit is hosted again this year in Seattle, WA and it is scheduled for October 11-14. The schedule for my two sessions is not final at the moment of writing this but so far it looks like I will be speaking on Wednesday and Friday.

I am flying back from the PASS Summit on Saturday and planning to present my two sessions at the SoCal Code Camp the following day, Sunday October 16th. The SoCal Code Camp is a community driven event for developers to come and learn from their peers. At this moment they are still accepting sessions so no schedule has been created yet. You can register, find additional information and directions on their website here.

Finally, although I am not going to be speaking, I will be attending SQL in the City in Los Angeles, CA on October 28th. SQL in the City is a one day SQL Server training event which will include several SQL Server MVPs and you can look at their site here for more details and information.

I look forward to meeting lots of SQL Server professionals on these events.

Code from my book Inside the SQL Server Query Optimizer

Recently I’ve been requested the code of my book Inside the Server Query Optimizer so I am including it in this post. The book contains a large number of example SQL queries, all of which are based on the AdventureWorks database and Chapter 6 additionally uses the AdventureWorksDW database. All code has been tested on both SQL Server 2008 and SQL Server 2008 R2. Note that these sample databases are not included by default in your SQL Server installation, but can be downloaded from the CodePlex website.

Inside the SQL Server Query Optimizer code – InsideQueryOptimizerCode.txt

Query Optimization with Denali Columnstore Indexes

In a previous post I talked about the new columnstore indexes and their related processing algorithms which are available in SQL Server code-named Denali. In this post I will cover the query processing part of the technology in more detail and will show you some examples that you can test on the recently released CTP3 (Community Technology Preview) of the product.

Same as with previous versions of SQL Server, in Denali the query optimizer can choose between the available access methods, which now also include columnstore indexes, and as always, this will be a cost-based decision. A new choice the query optimizer will have to make is the selection of an execution mode. The new query processing algorithms mentioned in my previous post will run in what is called a batch execution mode, which is different from the traditional processing mode, now called row mode.

In the row execution mode operators process data one row at a time. The new batch execution mode process data in batches which is more efficient for large amounts of data, like the workloads present on data warehouse queries. Each operator in an execution plan can use the row execution mode and, when columnstore indexes are available, some operators can also use the batch mode. There is both an estimated and an actual execution mode and this information is displayed on the query execution plan as I will show later. It is also worth mentioning that, although columnstore indexes can speed up the performance of data warehouse queries, they are not a good choice for very selective queries returning only a few records. In this case the query optimizer may have to rely on row stores, like clustered or regular nonclustered indexes, to find those records quickly. There are no seeks on columnstore indexes.

Same as with previous versions of SQL Server, you still have the choice to use a hint to force any index in the cases where the query optimizer is not giving you a good execution plan. This can happen for example when the query optimizer is choosing a columnstore index when it shouldn’t or when you want to force a columnstore index when it is not being selected. You can also use the new IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint to ask the query optimizer to avoid using any columnstore index.

Let me show you an example which you can test on SQL Server Denali CTP3, currently available for download here. To follow this example you will also need the AdventureWorksDWDenali database, available at CodePlex and I will use the same example on BOL to skip the basics and go directly to analyze the batch processing mode (By the way the BOL example didn’t work directly with the AdventureWorksDWDenali database so I had to add a few more columns at the end of the CREATE TABLE statement.)

First, use the following BOL code to create a partition function, a partition scheme and a new partitioned table with a columnstore index

USE AdventureWorksDWDenali;
GO

CREATE PARTITION FUNCTION [ByOrderDateMonthPF](int) AS RANGE RIGHT
FOR VALUES (
    20050701, 20050801, 20050901, 20051001, 20051101, 20051201,
    20060101, 20060201, 20060301, 20060401, 20060501, 20060601,
    20060701, 20060801, 20060901, 20061001, 20061101, 20061201,
    20070101, 20070201, 20070301, 20070401, 20070501, 20070601,
    20070701, 20070801, 20070901, 20071001, 20071101, 20071201,
    20080101, 20080201, 20080301, 20080401, 20080501, 20080601,
    20080701, 20080801, 20080901, 20081001, 20081101, 20081201
)
GO

CREATE PARTITION SCHEME [ByOrderDateMonthRange]
AS PARTITION [ByOrderDateMonthPF]
ALL TO ([PRIMARY])
GO

-- Create a partitioned version of the FactResellerSales table
CREATE TABLE [dbo].[FactResellerSalesPtnd](
    [ProductKey] [int] NOT NULL,
    [OrderDateKey] [int] NOT NULL,
    [DueDateKey] [int] NOT NULL,
    [ShipDateKey] [int] NOT NULL,
    [ResellerKey] [int] NOT NULL,
    [EmployeeKey] [int] NOT NULL,
    [PromotionKey] [int] NOT NULL,
    [CurrencyKey] [int] NOT NULL,
    [SalesTerritoryKey] [int] NOT NULL,
    [SalesOrderNumber] [nvarchar](20) NOT NULL,
    [SalesOrderLineNumber] [tinyint] NOT NULL,
    [RevisionNumber] [tinyint] NULL,
    [OrderQuantity] [smallint] NULL,
    [UnitPrice] [money] NULL,
    [ExtendedAmount] [money] NULL,
    [UnitPriceDiscountPct] [float] NULL,
    [DiscountAmount] [float] NULL,
    [ProductStandardCost] [money] NULL,
    [TotalProductCost] [money] NULL,
    [SalesAmount] [money] NULL,
    [TaxAmt] [money] NULL,
    [Freight] [money] NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [CustomerPONumber] [nvarchar](25) NULL,
    [OrderDate] datetime NULL,
    [DueDate] datetime NULL,
    [ShipDate] datetime NULL
) ON ByOrderDateMonthRange(OrderDateKey);
GO

-- Copy the data from the FactResellerSales into the new table
INSERT INTO dbo.FactResellerSalesPtnd WITH(TABLOCK)
SELECT * FROM dbo.FactResellerSales;
GO

-- Create the columnstore index
CREATE NONCLUSTERED COLUMNSTORE INDEX [csindx_FactResellerSalesPtnd]
ON [FactResellerSalesPtnd]
(
    [ProductKey],
    [OrderDateKey],
    [DueDateKey],
    [ShipDateKey],
    [ResellerKey],
    [EmployeeKey],
    [PromotionKey],
    [CurrencyKey],
    [SalesTerritoryKey],
    [SalesOrderNumber],
    [SalesOrderLineNumber],
    [RevisionNumber],
    [OrderQuantity],
    [UnitPrice],
    [ExtendedAmount],
    [UnitPriceDiscountPct],
    [DiscountAmount],
    [ProductStandardCost],
    [TotalProductCost],
    [SalesAmount],
    [TaxAmt],
    [Freight],
    [CarrierTrackingNumber],
    [CustomerPONumber]
);

Now run the following query

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory
FROM FactResellerSalesPtnd
GROUP BY SalesTerritoryKey;

This will create the following plan where you can see the new Columnstore Index Scan operator

clip_image002

The properties of the Columnstore Index Scan operator are shown next

clip_image003

You may notice that the actual and estimated execution mode is Row (lines 3 and 4 on the list of properties). Row execution mode was selected because the table is not large enough to require the batch execution mode. We can use the undocumented ROWCOUNT and PAGECOUNT options of the UPDATE STATISTICS statement to simulate a larger table as shown next (for more information about how this works see my post about the DTA here)

UPDATE STATISTICS FactResellerSalesPtnd WITH ROWCOUNT = 10000000, PAGECOUNT = 1000000

Removing the existing plan (using for example DBCC FREEPROCCACHE) and running the same query again will now show the following plan (only part is shown), which this time is using parallelism.

clip_image005

In addition, by looking at the properties of the Columnstore Index Scan you can notice that this time it is using the batch execution mode

clip_image006

You can also use the new IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint to disallow the use of a columnstore index. Run the following code

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory
FROM FactResellerSalesPtnd
GROUP BY SalesTerritoryKey
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX);

This will show you the following plan which as you can see it is now directly using the FactResellerSalesPtnd table, without using the columnstore index.

clip_image008

Finally, since the number of records and pages of the FactResellerSalesPtnd table was altered for this test, perhaps you want drop it and create a new copy if you need to do some additional testing

DROP TABLE FactResellerSalesPtnd

Speaking at the PASS Summit 2011

I am honored to be speaking at the PASS Summit again this year. I’ve been attending this SQL Server conference every year since 2003 and this will be my fourth year speaking. Same as last year I will again be presenting two sessions.

On my first session, Inside the SQL Server Query Optimizer, I will go into the internals of the Query Optimizer and will show you the steps that it performs in the background covering everything from the time a query is submitted to SQL Server until an execution plan is generated. On my second session, Parameter Sniffing: the Query Optimizer vs. the Plan Cache, I will show you how the Query Optimizer uses parameter sniffing to produce a plan tailored to the current parameters of a query and why in some cases it could be a performance problem, including troubleshooting and solutions to these cases.

The PASS Summit is less than four months away and you can register here. I look forward to meeting lots of SQL Server professionals, including those whom I only know via twitter. See you in Seattle in October.

Statistics on Computed Columns

Another interesting topic that I usually talk about on my presentations is statistics on computed columns so I will use this post to show you how they work and how they can help you to improve the performance of your queries.

A problem faced by some queries using scalar expressions is that they usually cannot benefit from statistics and, without them, the Query Optimizer will use the 30% selectivity guess on inequality comparisons. A solution to this problem can be the use of computed columns, as SQL Server can automatically create and update statistics on these columns which can help the Query Optimizer to create better execution plans. An additional benefit of this solution is that you don’t need to specify the name of the computed column in your queries for SQL Server to use its statistics. The Query Optimizer automatically matches the computed column definition to an existing scalar expression in a query, so your applications do not need to be changed. Although computed columns have been available in previous versions of SQL Server, the automatic matching feature was only introduced with SQL Server 2005.

To see an example, run this query, which creates the plan shown next:

SELECT * FROM Sales.SalesOrderDetail
WHERE OrderQty * UnitPrice > 25000

clip_image002The estimated number of rows is 36,395.1, which is 30% of the total number of rows, 121,317, although the query returns only 5 records. SQL Server is obviously using a selectivity guess, as it cannot estimate the selectivity of the expression OrderQty * UnitPrice > 25000.

Now create a computed column:

ALTER TABLE Sales.SalesOrderDetail
ADD cc AS OrderQty * UnitPrice

Run the previous SELECT statement again and note that, this time, the estimated number of rows has changed to 84.3101 which is very close to the actual number of rows returned by the query, as shown in the following plan:

clip_image004You can optionally test replacing the 25,000 in the query with some other values, like 1,000, 10,000, or 20,000 and verify that the estimated again will be close to the actual number of rows returned.

Note that creating the computed column does not create statistics; these statistics are created the first time that the query is optimized, and you can run the next query to display the information about the statistics objects for the Sales.SalesOrderDetail table:

SELECT * FROM sys.stats
WHERE object_id = object_id('Sales.SalesOrderDetail')

The newly created statistics object will most likely be at the end of the list. Copy its name and use the following command to display the details about the statistics object (I’ve used the name of my local object, but you should replace that as appropriate). You can also use "cc" as the name of the object to get the same results. In both cases, the "cc" column should be shown on the Columns field in the density section.

DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail', _WA_Sys_0000000C_2645B050)

Unfortunately, for the automatic matching feature to work, the expression must be exactly the same as the computed column definition. So, if I change the query to UnitPrice * OrderQty, instead of OrderQty * UnitPrice, the execution plan will show an estimated number of rows of 30% again, as this query will demonstrate:

SELECT * FROM Sales.SalesOrderDetail
WHERE UnitPrice * OrderQty > 25000

As mentioned, the computed column provides statistics so the Query Optimizer can try to get you a better execution plan. In addition, you can create an index on the existing computed column to provide a better navigational alternative. Create the following index

CREATE INDEX IX_cc on Sales.SalesOrderDetail(cc)

By running the original SELECT statement again the Query Optimizer will now choose the newly created index and will produce a more efficient plan using an Index Seek/Key Lookup instead of a Clustered Index Scan, as shown next.

clip_image006Finally, drop the index and computed column you’ve just created:

DROP INDEX Sales.SalesOrderDetail.IX_cc
GO
ALTER TABLE Sales.SalesOrderDetail
DROP COLUMN cc

The Query Optimizer and Contradiction Detection

As covered in my book Inside the SQL Server Query Optimizer, contradiction detection is a query rewrite (or tree rewrite?) performed at the simplification phase of the optimization process in which query contradictions are detected and removed. Since these parts of the query are not executed at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. For example, the Query Optimizer may know that no records can satisfy a predicate even before touching any page of data. A contradiction may be related to a check constraint, or may be related to the way the query is written. I will show you examples of both cases next.

First, I need to find a table with a check constraint in AdventureWorks and, handily, the Employee table has the following check constraint definition:

([VacationHours]>=(-40) AND [VacationHours]<=(240))

This check constraint makes sure that the number of vacation hours is a number between –40 and 240, so when I request

SELECT * FROM HumanResources.Employee
WHERE VacationHours > 80

… SQL Server uses a Clustered Index Scan operator, as shown next

clip_image002However, if I request all of the employees with more than 300 vacation hours then, because of this check constraint, the Query Optimizer must immediately know that no records qualify for predicate. Run the following code

SELECT * FROM HumanResources.Employee
WHERE VacationHours > 300

As expected, the query will return no records, but this time it will show the following execution planclip_image004Note that, this time, instead of a Clustered Index Scan, SQL Server is using a Constant Scan operator. Since there is no need to access the table at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. Now, let’s see what happens if I disable the check constraint

ALTER TABLE HumanResources.Employee NOCHECK CONSTRAINT CK_Employee_VacationHours

This time, running the last query once again uses a Clustered Index Scan operator, as the Query Optimizer can no longer use the check constraint to guide its decisions. Don’t forget to enable the constraint again by running the following statement:

ALTER TABLE HumanResources.Employee WITH CHECK CHECK CONSTRAINT
CK_Employee_VacationHours

The second type of contradiction case is when the query itself explicitly contains a contradiction. Take a look at the next query

SELECT * FROM HumanResources.Employee
WHERE ManagerID > 10 AND ManagerID < 5

In this case there is no check constraint involved; both predicates are valid and each will individually return records, but they contradict each other when they are run together. As a result, the query returns no records and the plan shows again a Constant Scan operator similar to the plan shown previously. This may just look like a badly written query, but remember that some predicates may already be included in, for example, view definitions, and the developer of the query may be unaware of those. For example, in our last query, a view may include the predicate ManagerID > 10 and a developer may call this view using the predicate ManagerID < 5. Since both predicates contradict each other a Constant Scan operator will be used again instead.

Database Engine Tuning Advisor and the Query Optimizer – Part 2

One of the most interesting and perhaps not well known features of the Database Engine Tuning Advisor (DTA) is that you can use it with a test server to tune the workload of a production server. As I mentioned on the first part of this post, the DTA relies on the Query Optimizer to make its tuning recommendations and you can use it to make these optimizer calls to a test server instance without impacting the performance of the production server.

Information Required by the Query Optimizer

To better understand how this works let us first review what kind of information the Query Optimizer needs to tune a workload. Basically the most important information it needs to perform an optimization is:

1) The database metadata (i.e. table and column definitions, indexes, constraints, etc.)

2) Optimizer statistics (index and column statistics)

3) Table size (number of rows and pages)

4) Available memory and number of processors

The DTA can gather the database metadata and statistics from the production server and use it to create a similar database, with no data, on a different server. This is called a shell database. The DTA can also obtain the available memory and number of processors on the production server, by using the extended stored procedure xp_msver, and use this information for the optimization process. It is important to remind that no data is needed for the optimization process. This process is summarized in the following figure taken from Books Online: clip_image001[8]

This process provides the following benefits:

1) There is no need to do an expensive optimization on the production server which can impact its resources usage. Production server is only used to gather initial metadata and the required statistics.

2) No need to copy the entire database to a test server either, which is especially important for big databases, saving disk space and time to copy the database

3) No problems where test servers are not as powerful as production server as the DTA tuning session will consider the available memory and number of processors of the production server.

Running a Tuning Session

Now I am going to show an example of how to run a tuning session. First of all, the use of a test server is not supported by the DTA graphical user interface so the use of the dta utility, the command prompt version of DTA, is required. Configuring a test server also requires an XML input file containing the dta input information. I am using the following input file for this example

<?xml version="1.0" encoding="utf-16" ?>
<DTAXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://schemas.microsoft.com/sqlserver/2004/07/dta">
  <DTAInput>
    <Server>
      <Name>production_instance</Name>
      <Database>
        <Name>AdventureWorks</Name>
      </Database>
    </Server>
    <Workload>
      <File>workload.sql</File>
    </Workload>
    <TuningOptions>
      <TestServer>test_instance</TestServer>
      <FeatureSet>IDX</FeatureSet>
      <Partitioning>NONE</Partitioning>
      <KeepExisting>NONE</KeepExisting>
    </TuningOptions>
  </DTAInput>
</DTAXML>

The Server and Database elements of the XML file include the production SQL Server instance and database. The Workload element includes the definition of a script containing the workload to tune. TuningOptions includes the TestServer subelement which is used to include the name of the test SQL Server instance.

Create the workload.sql file containing a simple query like this

SELECT * FROM AdventureWorks.Sales.SalesOrderDetail
WHERE ProductID = 898

Run the following command

dta -ix input.xml -S production_instance -s session1

A successful execution will show an output similar to this

Microsoft (R) SQL Server Microsoft SQL Server Database Engine Tuning Advisor com
mand line utility
Version 9.00.5000.00
Copyright (c) Microsoft Corporation. All rights reserved.

Tuning session successfully created. Session ID is 26.

Total time used: 00:00:03
Workload consumed: 100%, Estimated improvement: 96%

Tuning process finished.

This example creates an entire copy of AdventureWorks (with no data) and performs the requested optimization. The shell database is automatically deleted after the tuning session is completed. Optionally you can keep the shell database, for example if you want to use it again on another tuning exercise, by using the RetainShellDB in the TuningOptions element like in the following XML fragment.

<TuningOptions>
  <TestServer>test_instance</TestServer>
  <FeatureSet>IDX</FeatureSet>
  <Partitioning>NONE</Partitioning>
  <KeepExisting>NONE</KeepExisting>
  <RetainShellDB>1</RetainShellDB>
</TuningOptions>

If the shell database already exists when you request a tuning session, the database creation process will be skipped. However, you will have to manually delete this database when it is no longer needed.

Once the tuning session is completed you can use the DTA graphical user interface as usual to see the recommendations. To do this open the DTA, open the session you used by double-clicking its session name (session1 in our example) and chose the Recommendations tab if it is not already selected.

Scripting Statistics

Although the DTA automatically gathers the metadata and statistics to build the shell database, I am going to show you how to script the required objects and statistics to tune a simple query. This can be helpful in cases where you don’t want to script the entire database. Scripting database objects is a fairly simple process well known by SQL Server professionals. Something that may be new for many though, is how to script the statistics. Created scripts make use of the undocumented STATS_STREAM, ROWCOUNT and PAGECOUNT options of the CREATE/UPDATE STATISTICS statement.

As an example to optimize the simple query shown previously try the following on Management Studio: Select Databases, right-click the AdventureWorks database, select Tasks, Generate Scripts …, click Next, select “Select specific database objects”, expand Tables, select Sales.SalesOrderDetail, click Next, click Advanced, look for the “Script Statistics” choice and select “Script statistics and histograms”. Finally chose True on “Script Indexes”. Your Advanced Scripting Options window should look similar to this:

clip_image002

Click Ok and finish the wizard to generate the scripts. You will get a script with a few UPDATE STATISTICS statements similar to this (with the STAT_STREAM value shortened to fit in this page).

UPDATE STATISTICS [Sales].[SalesOrderDetail]([IX_SalesOrderDetail_ProductID])
WITH STATS_STREAM = 0x010000000300000000000000000000004036000 ,
ROWCOUNT = 121317, PAGECOUNT = 227

These UPDATE STATISTICS statements are used to update the statistics of existing indexes (obviously the related CREATE INDEX statements were scripted as well). If the table also has column statistics it will include CREATE STATISTICS statements instead.

Testing Scripted Statistics

Finally, I will show you an example of how to use the scripted statistics to obtain plans and cost estimates on an empty table. Running the following query on the regular AdventureWorks database creates the following plan with an estimated number of rows of 9 and a cost of 0.0296835.

SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 898

clip_image002[7]

Let us produce the same plan on an empty database. Following the procedure described before you can script the Sales.SalesOrderDetail table. You will end with multiple statements including the following (again shortened to fit on this post).

CREATE TABLE [Sales].[SalesOrderDetail](
    [SalesOrderID] [int] NOT NULL,
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON
[Sales].[SalesOrderDetail]
(
    [ProductID] ASC
)
GO
UPDATE STATISTICS [Sales].[SalesOrderDetail]([IX_SalesOrderDetail_ProductID])
WITH STATS_STREAM = 0x010000000300000000000, ROWCOUNT = 121317, PAGECOUNT = 227
GO
UPDATE STATISTICS [Sales].[SalesOrderDetail]
([PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID])
WITH STATS_STREAM = 0x010000000200000000000000000000003C2F68F6, ROWCOUNT = 121317,
PAGECOUNT = 1237

Create a new database and run at least the previous four statements using the scripts you got on the previous step (or you can use the attached script on this post containing the statements needed to reproduce the example). After implementing the script on an empty database and running the sample query, you will get again the plan with cost 0.0296835 and estimated number of rows of 9.

My PASS Summit 2011 Submissions

clip_image002

I’ve been attending the PASS Summit every year since 2003 and been fortunate enough to be speaking there a few times too. Of course I am planning to attend again this year and I’ve also submitted a few sessions this time.

Something new this year though, is the fact that the session selection has been made open to the SQL Server community and we all can vote for the sessions we would like to see at the Summit. Voting closes this Friday, May 20th and we can choose from any of the 646 sessions submitted. So, if you like any of my sessions, which are described next, please vote for them here. I look forward to seeing you at the PASS Summit in October.

My submitted abstracts, all of which are for regular sessions of 75 minutes, are:

Inside the SQL Server Query Optimizer [400]

The SQL Server Query Optimizer is a cost-based optimizer: it analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans, and selects the plan with the lowest cost. In this session I will go into the internals of the Query Optimizer and will show you the steps that it performs in the background covering everything from the time a query is submitted to SQL Server until an execution plan is generated. I’ll show you how the Query Optimizer generates possible alternative execution plans, how these plans are stored for the duration of the optimization process, how heuristics are used to limit the number of alternative plans considered, how each candidate plan is also costed, and finally how the best alternative is chosen based on those costs. I will also cover why query optimization is an inherently complex problem and why challenges in some of its most fundamental areas are still being addressed today.

Query Optimizer Statistics for Better Performance [300]

The SQL Server Query Optimizer is a cost-based optimizer: the quality of the execution plans it generates is directly related to the accuracy of its cost estimations. In the same way, the estimated cost of a plan is based on the query’s cardinality estimation and the algorithms or operators it uses. In this session I will show you how the Query Optimizer uses statistics to estimate this cardinality and why having good quality statistics is extremely important for the performance of your queries. I will also cover how to diagnose and troubleshoot cardinality estimation problems in the cases when you are not getting a good execution plan and will provide solutions to these problems. Finally, existing challenges regarding statistics still faced by query optimizers today will also be discussed.

Partitioned Tables and Indexes: Management and Query Processing [300]

You’ve decided to implement partitioning because you want easier management of very large tables and indexes and want to improve your data loading, deletion and archival operations. But you also need to understand the query processing implications and how partitioning will impact your existing queries. On the other side, many SQL Server users still believe that the primary purpose of partitioning is to make your queries run faster by querying only a specific partition. In this session I will show you the reality and will focus on both how to implement partitioning to achieve better maintenance and data availability and how it will impact query processing, including benefits, potential issues, and recommendations.

Parameter Sniffing: the Query Optimizer vs. the Plan Cache [300]

Parameter sniffing is a good thing: it is used by the Query Optimizer to produce an execution plan tailored to the current parameters of a query. However, due to the way that the plan cache stores these plans in memory, sometimes can also be a performance problem. This session will show you how parameter sniffing works and in which cases could be a problem. How to diagnose and troubleshoot parameter sniffing problems and their solutions will be discussed as well. The session will also include details on how the Query Optimizer uses the histogram and density components of the statistics object and some other advanced topics.