Benjamin Nevarez Rotating Header Image

Disabling Parameter Sniffing?

As I mentioned in a previous post, parameter sniffing is a good thing: it allows you to get an execution plan tailored to the current parameters of your query. Of course, sometimes it can also be a problem but there are some solutions available. Some of these solutions are covered in my posts here and here.

However, Microsoft recently released a cumulative update which provides a trace flag to disable parameter sniffing at the instance level. This cumulative update is available for the latest versions of SQL Server as described on the knowledge base article 980653.

Basically this trace flag, 4136, has the effect of disabling the use of histograms, a behavior similar to the use of the OPTIMIZE FOR UNKNOWN hint. There are still three cases where this trace flag has no effect, as described in the previous knowledge base article, which are on queries using the OPTIMIZE FOR or RECOMPILE query hints and on stored procedures using the WITH RECOMPILE option.

In general I would not recommend using this trace flag and would ask you to try the other solutions available instead. But anyway, it is good to know that this choice exists and can be used in cases when you really need it. It should be used carefully and only when enough testing shows that in fact it improves the performance of your application.

But let us test it to see how it works. I am testing it with SQL Server 2008 R2. My original build is 10.50.1600. After the cumulative update is installed the build is 10.50.1720.

Let us use the same example described on my OPTIMIZE FOR UNKNOWN post so perhaps you want to refer to it to better understand the details. Create the following stored procedure on the AdventureWorks database.

CREATE PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

Executing the stored procedure before the cumulative update, or after the cumulative update but without using the flag

EXEC test @pid = 709

shows the following plan

clip_image002

In this case, since the trace flag is not yet in effect, SQL Server uses the statistics histogram to estimate the number of rows which in this case is 188. After I enable the trace flag, restart my SQL Server instance, and run the same stored procedure again I got the following plan where the estimated number of rows is now 456.079. Again, how these values were obtained was explained in my previous post.

clip_image004

Let us try a test using the OPTIMIZE FOR query hint, which ignores the 4136 trace flag (note that it is not the same as the OPTIMIZE FOR UNKNOWN hint) by using the following code.

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (OPTIMIZE FOR (@pid = 709))

If you try this version of the stored procedure, even with the trace flag enabled, it will use the histogram again and will create a plan using the estimated number of rows of 188.

Finally, if you followed this exercise, do not forget to remove the trace flag and restart your SQL Server service.

Back from Vacation

I can’t believe I spent more than two weeks without doing anything related to SQL Server. Yes, I have been on vacation. And now that I am back home I found that I have to catch up with dozens of blog articles to read.

Some of the activities I am planning to do now that I am back include finishing writing my book about the SQL Server Query Optimizer and working on my presentations for the PASS Summit, this time two sessions, as I mentioned in a previous post. And of course, I will try to continue blogging as much as possible. Hey, by the way, it has been a year already since I started blogging: my first article on SQLblog was posted back in July 26, 2009.

By the way, I spent this vacation traveling with my family and visiting places like the Grand Canyon in Arizona, The Arches National Park in Utah, rafting on the Colorado River in Colorado, visiting Mount Rushmore in South Dakota, the Yellowstone Park in Wyoming and finally closing the trip with a stay in Las Vegas, Nevada. I include a few pictures here.

clip_image002

Grand Canyon National Park, Arizona

clip_image004

The Delicate Arch, Arches National Park, Utah

clip_image006

Mount Rushmore National Memorial, South Dakota

clip_image008

The Old Faithful Geyser, Yellowstone National Park, Wyoming

An Introduction to Cost Estimation

Last year when I presented my session regarding the Query Optimizer at the PASS Summit, I was asked how the estimated CPU and I/O costs in an execution plan are calculated, that is, where a specific value like 1.13256 is coming from. All I was able to say at the moment was that Microsoft does not publish how these costs are calculated.

Since this time I am working on a related project, I thought that perhaps I could look into this question again and show one example. But since there are dozens of operators, I decided to take a look at a simple one: the Clustered Index Scan operator. So I captured dozens of XML plans, used XQuery to extract their cost information and after some analysis I was able to obtain a basic formula for this specific operator.

But first a quick introduction to cost estimation: the cost of each operator depends on its algorithm, each operator is associated with a CPU cost, and some of them will also have an I/O cost. The total cost of the operator is the sum of these two costs. An operator like a Clustered Index Scan has both CPU and I/O costs. Some other operators, like a Stream Aggregate, will have only CPU cost. It is interesting to note that this cost used to mean the estimated time in seconds that a query or operator would take to execute on a particular reference machine. In recent versions of SQL Server this cost should no longer be interpreted as seconds, milliseconds, or any other unit.

To show the example, let us look at the largest table in AdventureWorks, Sales.SalesOrderDetail. Run the following query and look at the estimated CPU and I/O costs for the Clustered Index Scan operator as shown on the next figure.

SELECT * FROM Sales.SalesOrderDetail

WHERE LineTotal = 35

clip_image002

For a Clustered Index Scan operator, I observed that the CPU cost is 0.0001581 for the first record, plus 0.0000011 for any additional record after that. In this specific case we have an estimated number of records of 121,317, as shown on the picture above, so we can use 0.0001581 + 0.0000011 * (121317 – 1) or 0.133606 which is the value shown as Estimated CPU Cost. In a similar way, I noticed that the minimum I/O cost is 0.003125 for the first database page and then it grows in increments of 0.00074074 for every additional page. Since the Clustered Index Scan operator scans the entire table, I can use the following query to find the number of database pages, which returns 1,234.

SELECT in_row_data_page_count, row_count

FROM sys.dm_db_partition_stats

WHERE object_id = object_id(‘Sales.SalesOrderDetail’)

AND index_id = 1

 

In this case we have 0.003125 + 0.00074074 * (1234 – 1) or 0.916458 which is the value shown as estimated I/O Cost.

Finally, we add both costs, 0.133606 + 0.916458 to get 1.05006 which is the total estimated cost of the operator. In the same way, adding the cost of all the operators will give the total cost of the plan. In this case, the cost of the Clustered Index Scan, 1.05006, plus the cost of the first Compute Scalar operator, 0.01214, the second Compute Scalar operator, 0.01213, and the cost of the Filter operator, 0.0582322, will give the total cost of the plan, 1.13256, as shown next.

clip_image004

Avoiding Backup Messages on the Error Log

One of the most useful trace flags I use on my SQL Server instances is trace flag 3226, which prevents SQL Server from writing those successful backup messages to the error log. By default, every time a database backup of any type is completed successfully, a message similar to the following is written to the SQL Server error log.

Log was backed up. Database: Test, creation date(time): 2010/06/28(14:53:06), first LSN: 21:2235:1, last LSN: 21:2235:1, number of dump devices: 1, device information: (FILE=3, TYPE=DISK: …

So when you perform many backups, especially transaction log backups, and/or have many databases on the same instance, the SQL Server error log could contain hundreds or thousands of these messages in a way that it can become difficult to find any other useful information there. When this trace flag is used, backup messages are no longer written to the error log or the system event log.

Although this trace flag had been available since SQL Server 2000, most of us learned about it until 2007 when both Andy Kelly and Kevin Farlee blogged about it. At the time it was undocumented but it is now totally documented, appearing on the Trace Flags entry of Books Online.

Trace flags can be set on and off by using the DBCC TRACEON and DBCC TRACEOFF commands or by using the -T startup option, although the latest choice is more appropriate for this specific case. One way to use –T startup option is by right-clicking on the SQL Server service using Configuration Manager, selecting Properties and the Advanced tab, and adding ;-T3226 to the Startup Parameters entry as shown in the next figure. Finally, you will be required to restart your SQL Server service for this configuration change to take effect.

clip_image002

Presenting at the PASS Summit 2010

I am honored to be selected to present at the PASS Summit for the third time. This November in Seattle I will be presenting the following two sessions:

Top 10 Query Optimizer Topics for Better Query Performance

This session will show you how a better understanding on how the Query Optimizer works can help you to improve the performance of your queries. I will show you the top 10 Query Optimizer topics that can give you the more benefit by focusing both on the concepts and practical solutions. The SQL Server Query Optimizer is a cost-based optimizer which job is to analyze the possible execution plans for a query, estimate the cost of these plans and select the one with the lowest cost. So a better knowledge on how the Query Optimizer works can help both database developers and administrators to get better performance from their databases. Several areas of the query processor will be covered, everything from troubleshooting query performance problems and identifying what information the Query Optimizer needs to do a better job to the extreme cases where, because of the its limitations, the Query Optimizer may not give you a good plan and you may need to take a different approach.

Inside the SQL Server 2008 Data Collector

The SQL Server 2008 Data Collector provides some low overhead data collection functionality to store performance and diagnostics historic information of your SQL Server instances. See how you can use this information to troubleshoot problems and to provide trend analysis for the performance of your SQL Server instance. In addition to show the basics and architecture of the new Data Collector, this session focuses on the predefined system data collection sets that are provided by SQL Server 2008 that automatically collect data from the disk usage, instance activity, and queries statistics. You will learn about the Disk Usage collection set, which gathers statistics regarding the growth of the data and transaction log database files; explore the Server Activity collection set which focus on the server activity and resources utilization; and learn about the Query Statistics collection set which collects data regarding the queries running in your instance.

See you in Seattle!

clip_image002

SQL Server Printed Documentation

We all use Books Online these days when we need to look at the SQL Server documentation. But does anybody remember using any SQL Server printed documentation?

I was introduced to SQL Server back in February 22, 1999. Yes, I know the exact date because it was my first day attending the Microsoft training “System Administration for SQL Server 6.5” and I still have the diploma. SQL Server 7.0 was already out but somehow the offered training was not yet updated. A few years later when I was already working with SQL Server 2000, I found the printed documentation set from the SQL Server 6.0 days but I never really used it.

I am not sure which SQL Server version was the last one to include the printed documentation. Was it SQL Server 7.0? Perhaps somebody reading this can confirm it or comment about it.

I am including a couple of pictures of the books from the SQL Server 6.0 documentation

clip_image002

clip_image004

The documentation even has a poster with the system catalog which shows only 13 system tables for the master database and 17 system tables for user databases.

clip_image006

Rebuilding Indexes vs. Updating Statistics

One of the questions I was asked recently while speaking at the Los Angeles SQL Server Professionals Group and the Orange County SQL Server User Group (*) was regarding the order in which jobs like rebuilding indexes or updating statistics should be executed as part of the database maintenance activities. In general you should consider the following important points, focusing in the fact that there are two kinds of statistics: index and column statistics.

1) By default, the UPDATE STATISTICS statement updates both index and column statistics. Using the COLUMNS option of this statement will update column statistics only. Using the INDEX option will update index statistics only.

2) By default, the UPDATE STATISTICS statement uses only a sample of records of the table. Using UPDATE STATISTICS WITH FULLSCAN will scan the entire table.

3) Rebuilding an index, for example by using the ALTER INDEX … REBUILD statement, will update only index statistics with the equivalent of using WITH FULLSCAN. Rebuilding indexes does not update any column statistics.

4) Reorganizing an index, for example using the ALTER INDEX … REORGANIZE statement, does not update any statistics.

So depending on your maintenance jobs and scripts several scenarios can exist. The simplest scenario is if you want to rebuild all the indexes and update all the statistics. As mentioned before, if you rebuild all your indexes then all the index statistics will also be updated by scanning all the rows on the table. Then you just need to update your column statistics by running an UPDATE STATISTICS WITH FULLSCAN, COLUMNS statement. Since the first job only updates index statistics and the second one only updates column statistics, it does not matter which one you execute first.

Some other more complicated scenarios include when you have jobs which rebuild your indexes based on their fragmentation level. Of course, the worst case scenario would be if you first rebuild your indexes, which also updates the index statistics by scanning the entire table, and later you run UPDATE STATISTICS using the default values, which again updates the index statistics but this time only with a default and smaller sample. Not only you are updating your index statistics twice but you are overwriting the better of the two choices.

Let me show you how these commands work with some examples using the AdventureWorks database. Create a new table dbo.SalesOrderDetail

SELECT * INTO dbo.SalesOrderDetail

FROM sales.SalesOrderDetail

The next query uses the sys.stats catalog view and shows that initially there are no statistics objects for the new table.

SELECT name, auto_created, stats_date(object_id, stats_id) AS update_date FROM sys.stats

WHERE object_id = object_id(‘dbo.SalesOrderDetail’)

Use this query using the sys.stats catalog again when asked to inspect the status of the statistics after each of the following commands. Now run the following query

SELECT * FROM dbo.SalesOrderDetail

WHERE SalesOrderID = 43670 AND OrderQty = 1

Use the previous sys.stats query to verify that two statistics objects were created, one for the SalesOrderID column and another one for the OrderQty column (they both have names starting with _WA_Sys as shown later). Run the following statement to create an index on the ProductID column.

CREATE INDEX ix_ProductID ON dbo.SalesOrderDetail(ProductID)

Run again the query to verify that a new statistics object for the ProductID column has been created. Notice the value of the auto_created column which tells if the statistics were automatically created by the query processor.

name                          auto_created      update_date

_WA_Sys_00000004_7F80E8EA     1                 2010-07-01 23:27:45.953

_WA_Sys_00000001_7F80E8EA     1                 2010-07-01 23:27:46.117

ix_ProductID                  0                 2010-07-01 23:29:41.110

Run the next statement to update the column statistics only. You can validate that only the column statistics were updated by looking at the update_date column which uses the STATS_DATE function to display the last date the statistics were updated.

UPDATE STATISTICS dbo.SalesOrderDetail WITH FULLSCAN, COLUMNS

name                          auto_created      update_date

_WA_Sys_00000004_7F80E8EA     1                 2010-07-01 23:34:13.753

_WA_Sys_00000001_7F80E8EA     1                 2010-07-01 23:34:13.850

ix_ProductID                  0                 2010-07-01 23:29:41.110

This statement will do the same for the index statistics

UPDATE STATISTICS dbo.SalesOrderDetail WITH FULLSCAN, INDEX

The next two statements will update both index and column statistics

UPDATE STATISTICS dbo.SalesOrderDetail WITH FULLSCAN

UPDATE STATISTICS dbo.SalesOrderDetail WITH FULLSCAN, ALL

See how an index rebuild only updates index statistics

ALTER INDEX ix_ProductID ON dbo.SalesOrderDetail REBUILD

Here you can verify that reorganizing an index does not update any statistics

ALTER INDEX ix_ProductID ON dbo.SalesOrderDetail REORGANIZE

Finally, remove the table you have just created

DROP TABLE dbo.SalesOrderDetail

* Updated from an article originally written in October, 2009

Auto-Parameterization in SQL Server

The SQL Server Query Optimizer might decide to parameterize some queries in those cases where the value of a specific parameter does not impact the choice of an execution plan, that is, in the cases where it does not matter which parameter value is used, the plan returned will be the same. This is a very conservative policy and SQL Server will only use it when it is safe to do it, so the performance of the queries are not negatively impacted. In this case the parameterized plan can be reused by similar queries which differ only on the value of their parameters. This feature, which helps to avoid optimization time and cost, is call auto-parameterization and was introduced with SQL Server 7.0.

For example, on the AdventureWorks database, the next two SQL statements will produce different execution plans and will not be parameterized, even when the queries are exactly the same and only the parameters are different. In this case the Query Optimizer decides that it is not safe to auto-parameterize them.

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = 897

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = 870

On the other hand, the following query will be auto-parameterized.

SELECT * FROM Sales.SalesOrderHeader

WHERE SalesOrderID = 43669

In this last example the column SalesOrderID is the primary key of the SalesOrderHeader table so it is guaranteed to be unique. In addition, because the query predicate is using an equality operator and SalesOrderID is unique, there will be always a maximum of one record returned by this query. In this case SQL Server decides that it is safe to parameterize this plan by using a clustered index seek operator. You can verify if your query is using a parameterized plan by inspecting the plan cache like in the following query

SELECT text

FROM sys.dm_exec_cached_plans

CROSS APPLY sys.dm_exec_sql_text(plan_handle)

WHERE text LIKE ‘%Sales%’

This code will output the following auto-parameterized query which will show placeholders for the parameter values like @1

(@1 int)SELECT * FROM [Sales].[SalesOrderHeader] WHERE [SalesOrderID]=@1

Finally, a new feature to parameterize queries more aggressively called forced parameterization was introduced with SQL Server 2005. This feature is disabled by default and can be enabled at the database level (or inside a query using the PARAMETERIZATION FORCED query hint). By enabling forced parameterization you can reduce the frequency of query optimizations but you may also choose suboptimal plans for some instances of some queries, so you should do extensive analysis and testing of your application to verify that your performance is in fact being improved. To differentiate from forced parameterization, auto-parameterization is also referred as simple parameterization.

Optimizing Join Orders

The SQL Server Query Optimizer needs to take two important decisions regarding joins: the selection of a join order and the choice of a join algorithm. The selection of a join algorithm is the choice between a nested loops join, merge join or hash join operator, but I will leave that topic for some other post. In this post I will talk about join orders.

A join combines records from two tables based on some common information. But since a join works with only two tables at a time, a query requesting data from n tables must be executed as a sequence of n – 1 joins. Although the results of a query are the same regardless of the join order, the order in which the tables are joined greatly influences the cost and performance of a query. The Query Optimizer must find the optimal sequence of joins between the tables used in the query. Finding the optimal join order is one of the most difficult problems in query optimization and one that has been subject of extensive research since the seventies.

This basically means that the Query Optimizer defines the join order; it is not defined by your query. To see an example run the following query against the AdventureWorks database and choose to display the execution plan by clicking Include Actual Execution Plan.

SELECT FirstName, LastName

FROM Person.Contact AS C

    JOIN Sales.Individual AS I

        ON C.ContactID = I.ContactID

    JOIN Sales.Customer AS Cu

        ON I.CustomerID = Cu.CustomerID

WHERE Cu.CustomerType = ‘I’

It will show the following execution plan

clip_image002

Note that the Query Optimizer is not using the same join order specified in the query; it found a more efficient one. The query asks to join the Contact table with the Individual table, and then join the result with the Customer table, but if you inspect the three clustered index scan operators, the plan is first joining the Customer and Individual tables and then joining the result with the Contact table.

The problem of finding an efficient join order for a query is difficult because of the number of possible permutations that the Query Optimizer needs to analyze. Because of the commutative and associative properties of joins there could be many different possible join orders in a query and this number increases exponentially with the number of tables. In fact, with just a handful of tables the number of possible join orders could be in the thousands or even millions. Obviously, it is impossible for any query optimizer to look at all those combinations: it would take too long.

Queries are represented as trees in the query processor and some names like left-deep, right-deep and bushy trees are commonly used to identify the shapes of the order of joins in these trees. Left-deep and bushy trees for a join of four tables are shown in the next figure.

 clip_image004

For example, the left-deep tree shown could be JOIN( JOIN( JOIN(A, B), C), D) and the bushy tree could be JOIN(JOIN(A, B), JOIN(C, D)). Left-deep trees are also called linear trees or linear processing trees. The set of bushy trees includes the sets of both left-deep and right-deep trees. The following table lists the number of possible join orders considering left-deep and bushy trees.

Tables

Left-Deep Trees

Bushy Trees

1

1

1

2

2

2

3

6

12

4

24

120

5

120

1,680

6

720

30,240

7

5,040

665,280

8

40,320

17,297,280

9

362,880

518,918,400

10

3,628,800

17,643,225,600

11

39,916,800

670,442,572,800

12

479,001,600

28,158,588,057,600

The number of left-deep trees is calculated as n! or n factorial, where n is the number of tables in the relation. The factorial is the product of all the positive integers less than or equal to n. For example, 5! or 5 factorial is 5 x 4 x 3 x 2 x 1 or 120. Calculating the number of bushy tress is more complicated and can be calculated as (2n – 2)!/(n – 1)!.

So how does the Query Optimizer analyze all these possible join orders? The answer is: it does not. Analyzing all the possible join orders for a complex query could take a long time so the Query Optimizer must find a balance between the optimization time and the quality of the resulting plan. The Query Optimizer may not perform an exhaustive search and instead uses some heuristics to guide the search process. For example, a common heuristic is to avoid considering bushy tress during optimization, but I will leave some of these details for some other post.

Are You Running Multiserver Queries?

If you administer multiple instances of SQL Server and you have not tried multiserver queries, you are going to love this new SQL Server 2008 feature.

Multiserver queries allows you to run T-SQL statements against multiple SQL Server instances at the same time, returning the results in the same or separate result sets. In order to run multiserver queries you need to create server groups using the Registered Servers window, and register servers to become members of these groups. This can be accomplished by selecting either Local Server Groups or Central Management Servers.

Once you have created your groups and registered your servers you can right-click on any of the configured groups and select ‘New Query’. You can also create a query to be executed against all your groups and servers by selecting Local Server Groups or your Central Management Server. To verify that you are about to run a query against multiple servers make sure that the Status bar of the Query Editor is color pink and shows the name of the server group (or DatabaseEngineServerGroup if you have selected all the groups).

clip_image002

The registered servers could be any combination of SQL Server 2008 or SQL Server 2005 instances. I even successfully tested it with SQL Server 2000 instances.

Once in the Query Editor there is an endless list of interesting things you can do. A few examples follow.

Do you need a report of all the databases in all your servers? Just run a simple

SELECT * FROM sys.databases

Notice how the results sets for each instance (shown in the next picture) by default are concatenated and the final result set includes an additional ‘Server Name’ column indicating which instance these records are coming from. You can change the result set configuration by selecting Tools, Options, Query Results, SQL Server and Multiserver Results in Management Studio.

clip_image004

How about getting a report of when each of your SQL Server instances was started? Try

SELECT start_time FROM sys.dm_exec_requests WHERE session_id = 1

or

SELECT login_time FROM sysprocesses WHERE spid = 1

or if you have SQL Server 2008 instances only

SELECT sqlserver_start_time FROM sys.dm_os_sys_info

Or maybe a report of the disk space available on every drive on every server? Just run

EXEC xp_fixeddrives

Of course you can also query user databases, but the query needs to be valid on all the instances. For example, if you are like me and have a performance data collection database on each SQL Server instance, you can run a query to obtain some specific performance counters that you collect periodically, like in

USE <database_name>

GO

SELECT * FROM cpu_history

WHERE cpu_usage > 80