Benjamin Nevarez Rotating Header Image

Speaking at the Los Angeles SQL Server Professionals User Group

I haven’t updated this blog in a long time so I wanted to put in a quick post about a session that I will be presenting at the Los Angeles SQL Server Professionals User Group this Thursday, January 20th. The session, “Top 10 Query Optimizer Topics for Better Performance”, is the same topic I presented a couple of months ago at the PASS Summit in Seattle.

The meeting will be hosted at the UCLA campus and will start at 6:30 PM with Allen Berezovsky who will talk about File Stream in SQL Server. My session will follow next. More details and directions can be found at the Los Angeles SQL Server Professionals Group website.

I hope to see you there,

Benjamin

clip_image002

My Book, “Inside the Query Optimizer”, available at the PASS Summit

My book, “Inside the SQL Server Query Optimizer”, is almost finished and we will have a conference edition of it available at the PASS Summit. The final version of the book, published by Red Gate books, will be available on Amazon by Christmas.

For more details on the contents, I am including the Preface of the book next.

clip_image002

Preface

The Query Optimizer has always been one of my favorite SQL Server topics, which is why I started blogging about it and submitting related presentations to PASS. And so it would have continued, except that, after several blog posts discussing the Query Optimizer, Red Gate invited me to write a book about it. This is that book.

I started learning about the Query Optimizer by reading the very few SQL Server books which discussed the topic, and most of them covered it only very briefly. Yet I pressed on, and later, while trying to learn more about the topic, I found an extremely rich source of information in the form of the many available research papers. It was hard to fully grasp them at the beginning, as academic papers can be difficult to read and understand, but soon I got used to them, and was all the more knowledgeable for it.

Having said that, I feel that I’m in a bit of a minority, and that many people still see the Query Optimizer just as a black box where a query is submitted and an amazing execution plan is returned. It is also seen as a very complex component, and rightly so. It definitely is a very complex component, perhaps the most complex in database management software, but there is still a lot of great information about the Query Optimizer that SQL Server professionals can benefit from.  

The Query Optimizer is the SQL Server component that tries to give you an optimal execution plan for your queries and, just as importantly, tries to find that execution plan as quickly as possible. A better understanding of what the Query Optimizer does behind the scenes can help you to improve the performance of your databases and applications, and this book explains the core concepts behind how the SQL Server Query Optimizer works. With this knowledge, you’ll be able to write better queries, provide the Query Optimizer with the information it needs to produce efficient execution plans, and troubleshoot the cases when the Query Optimizer is not giving you a good plan.

With that in mind, and in case it’s not obvious, the content of this book is intended for SQL Server professionals: database developers and administrators, data architects, and basically anybody who submits more than just trivial queries to SQL Server. Here’s a quick overview of what the book covers:

The first chapter, Introduction to Query Optimization, starts with an overview on how the SQL Server Query Optimizer works and introduces the concepts that will be covered in more detail in the rest of the book. A look into some of the challenges query optimizers still face today is covered next, along with a section on how to read and understand execution plans. The Chapter closes with a discussion of join ordering, traditionally one of the most complex problems in query optimization.

The second chapter talks about the Execution Engine, and describes it as a collection of physical operators that perform the functions of the query processor. It emphasizes how these operations, implemented by the Execution Engine, define the choices available to the Query Optimizer when building execution plans. This Chapter includes sections on data access operations, the concepts of sorting and hashing, aggregations, and joins, to conclude with a brief introduction to parallelism.

Chapter 3, Statistics and Cost Estimation, shows how the quality of the execution plans generated by the Query Optimizer is directly related to the accuracy of its cardinality and cost estimations. The Chapter describes Statistics objects in detail, and includes some sections on how statistics are created and maintained, as well as how they are used by the Query Optimizer. We’ll also take a look at how to detect cardinality estimation errors, which may cause the Query Optimizer to choose inefficient plans, together with some recommendations on how to avoid and fix these problems. Just to round off the subject, the chapter ends with and introduction to cost estimation.

Chapter 4, Index selection, shows how SQL Server can speed up your queries and dramatically improve the performance of your applications just by using the right indexes. The Chapter shows how SQL Server selects indexes, how you can provide better indexes, and how you can verify your execution plans to make sure these indexes are correctly used. We’ll talk about the Database Engine Tuning Advisor and the Missing Indexes feature, which will show how the Query Optimizer itself can provide you with index tuning recommendations.

Chapter 5, The Optimization Process, is the Chapter that goes right into the internals of the Query Optimizer and introduces the steps that it performs without you ever knowing. This covers everything from the moment a query is submitted to SQL Server until an execution plan is generated and is ready to be executed, including steps like parsing, binding, simplification, trivial plan and full optimization. Important components which are part of the Query Optimizer architecture, such as transformation rules and the memo structure, are also introduced.

Chapter 6, Additional Topics, includes a variety of subjects, starting with the basics of update operations and how they also need to be optimized just like any other query, so that they can be performed as quickly as possible. We’ll have an introduction to Data Warehousing and how SQL Server optimizes star queries, before launching into a detailed explanation of Parameter sniffing, along with some recommendations on how to avoid some problems presented by this behavior. Continuing with the topic of parameters, the Chapter concludes by discussing auto-parameterization and forced parameterization.

Chapter 7 describes Hints, and warns that, although hints are a powerful tool which allows you to take explicit control over the execution plan of a query, they need to be used with caution and only as a last resort when no other option is available. The chapter covers the most-used hints, and ends with a couple of sections on plan guides and the USE PLAN query hint.

Before we get started, please bear in mind that this book contains many undocumented SQL Server statements. These statements are provided only as a way to explore and understand the Query Optimizer and, as such, should not be used on a production environment. Use them wisely, and I hope you enjoy learning about this topic as much as I do.

Benjamin Nevarez

Presenting at the SoCal Rock & Roll Code Camp

I will be presenting two sessions at the SoCal Rock & Roll Code Camp this Saturday. This is a community driven event with over 100 sessions, hosted at the University of Southern California (USC) on both Saturday October 23rd and Sunday 24th. My sessions will be “Inside the SQL Server 2008 Data Collector” at 12:15 pm, and “Top 10 SQL Server Query Optimizer Topics for Better Performance” at 1:30 pm, both on room VKC-105.

For more information regarding sessions, schedule and directions visit the SoCal Rock & Roll Code Camp website.

clip_image002

Speaking at the Orange County SQL Server Professionals User Group

I will be speaking at the Orange County SQL Server Professionals User Group this Thursday, October 7th, 2010. The topic is “Top 10 Query Optimizer Topics for Better Performance”. So if you are in the Orange County or Los Angeles area please stop by and say hello. 

The meeting starts at 6:30 PM. More details and directions can be found here 

Orange County SQL Server Professionals User Group

http://www.sqloc.com

Disabling Parameter Sniffing?

As I mentioned in a previous post, parameter sniffing is a good thing: it allows you to get an execution plan tailored to the current parameters of your query. Of course, sometimes it can also be a problem but there are some solutions available. Some of these solutions are covered in my posts here and here.

However, Microsoft recently released a cumulative update which provides a trace flag to disable parameter sniffing at the instance level. This cumulative update is available for the latest versions of SQL Server as described on the knowledge base article 980653.

Basically this trace flag, 4136, has the effect of disabling the use of histograms, a behavior similar to the use of the OPTIMIZE FOR UNKNOWN hint. There are still three cases where this trace flag has no effect, as described in the previous knowledge base article, which are on queries using the OPTIMIZE FOR or RECOMPILE query hints and on stored procedures using the WITH RECOMPILE option.

In general I would not recommend using this trace flag and would ask you to try the other solutions available instead. But anyway, it is good to know that this choice exists and can be used in cases when you really need it. It should be used carefully and only when enough testing shows that in fact it improves the performance of your application.

But let us test it to see how it works. I am testing it with SQL Server 2008 R2. My original build is 10.50.1600. After the cumulative update is installed the build is 10.50.1720.

Let us use the same example described on my OPTIMIZE FOR UNKNOWN post so perhaps you want to refer to it to better understand the details. Create the following stored procedure on the AdventureWorks database.

CREATE PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

Executing the stored procedure before the cumulative update, or after the cumulative update but without using the flag

EXEC test @pid = 709

shows the following plan

clip_image002

In this case, since the trace flag is not yet in effect, SQL Server uses the statistics histogram to estimate the number of rows which in this case is 188. After I enable the trace flag, restart my SQL Server instance, and run the same stored procedure again I got the following plan where the estimated number of rows is now 456.079. Again, how these values were obtained was explained in my previous post.

clip_image004

Let us try a test using the OPTIMIZE FOR query hint, which ignores the 4136 trace flag (note that it is not the same as the OPTIMIZE FOR UNKNOWN hint) by using the following code.

ALTER PROCEDURE test (@pid int)

AS

SELECT * FROM Sales.SalesOrderDetail

WHERE ProductID = @pid

OPTION (OPTIMIZE FOR (@pid = 709))

If you try this version of the stored procedure, even with the trace flag enabled, it will use the histogram again and will create a plan using the estimated number of rows of 188.

Finally, if you followed this exercise, do not forget to remove the trace flag and restart your SQL Server service.

Back from Vacation

I can’t believe I spent more than two weeks without doing anything related to SQL Server. Yes, I have been on vacation. And now that I am back home I found that I have to catch up with dozens of blog articles to read.

Some of the activities I am planning to do now that I am back include finishing writing my book about the SQL Server Query Optimizer and working on my presentations for the PASS Summit, this time two sessions, as I mentioned in a previous post. And of course, I will try to continue blogging as much as possible. Hey, by the way, it has been a year already since I started blogging: my first article on SQLblog was posted back in July 26, 2009.

By the way, I spent this vacation traveling with my family and visiting places like the Grand Canyon in Arizona, The Arches National Park in Utah, rafting on the Colorado River in Colorado, visiting Mount Rushmore in South Dakota, the Yellowstone Park in Wyoming and finally closing the trip with a stay in Las Vegas, Nevada. I include a few pictures here.

clip_image002

Grand Canyon National Park, Arizona

clip_image004

The Delicate Arch, Arches National Park, Utah

clip_image006

Mount Rushmore National Memorial, South Dakota

clip_image008

The Old Faithful Geyser, Yellowstone National Park, Wyoming

An Introduction to Cost Estimation

Last year when I presented my session regarding the Query Optimizer at the PASS Summit, I was asked how the estimated CPU and I/O costs in an execution plan are calculated, that is, where a specific value like 1.13256 is coming from. All I was able to say at the moment was that Microsoft does not publish how these costs are calculated.

Since this time I am working on a related project, I thought that perhaps I could look into this question again and show one example. But since there are dozens of operators, I decided to take a look at a simple one: the Clustered Index Scan operator. So I captured dozens of XML plans, used XQuery to extract their cost information and after some analysis I was able to obtain a basic formula for this specific operator.

But first a quick introduction to cost estimation: the cost of each operator depends on its algorithm, each operator is associated with a CPU cost, and some of them will also have an I/O cost. The total cost of the operator is the sum of these two costs. An operator like a Clustered Index Scan has both CPU and I/O costs. Some other operators, like a Stream Aggregate, will have only CPU cost. It is interesting to note that this cost used to mean the estimated time in seconds that a query or operator would take to execute on a particular reference machine. In recent versions of SQL Server this cost should no longer be interpreted as seconds, milliseconds, or any other unit.

To show the example, let us look at the largest table in AdventureWorks, Sales.SalesOrderDetail. Run the following query and look at the estimated CPU and I/O costs for the Clustered Index Scan operator as shown on the next figure.

SELECT * FROM Sales.SalesOrderDetail

WHERE LineTotal = 35

clip_image002

For a Clustered Index Scan operator, I observed that the CPU cost is 0.0001581 for the first record, plus 0.0000011 for any additional record after that. In this specific case we have an estimated number of records of 121,317, as shown on the picture above, so we can use 0.0001581 + 0.0000011 * (121317 – 1) or 0.133606 which is the value shown as Estimated CPU Cost. In a similar way, I noticed that the minimum I/O cost is 0.003125 for the first database page and then it grows in increments of 0.00074074 for every additional page. Since the Clustered Index Scan operator scans the entire table, I can use the following query to find the number of database pages, which returns 1,234.

SELECT in_row_data_page_count, row_count

FROM sys.dm_db_partition_stats

WHERE object_id = object_id(‘Sales.SalesOrderDetail’)

AND index_id = 1

 

In this case we have 0.003125 + 0.00074074 * (1234 – 1) or 0.916458 which is the value shown as estimated I/O Cost.

Finally, we add both costs, 0.133606 + 0.916458 to get 1.05006 which is the total estimated cost of the operator. In the same way, adding the cost of all the operators will give the total cost of the plan. In this case, the cost of the Clustered Index Scan, 1.05006, plus the cost of the first Compute Scalar operator, 0.01214, the second Compute Scalar operator, 0.01213, and the cost of the Filter operator, 0.0582322, will give the total cost of the plan, 1.13256, as shown next.

clip_image004

Avoiding Backup Messages on the Error Log

One of the most useful trace flags I use on my SQL Server instances is trace flag 3226, which prevents SQL Server from writing those successful backup messages to the error log. By default, every time a database backup of any type is completed successfully, a message similar to the following is written to the SQL Server error log.

Log was backed up. Database: Test, creation date(time): 2010/06/28(14:53:06), first LSN: 21:2235:1, last LSN: 21:2235:1, number of dump devices: 1, device information: (FILE=3, TYPE=DISK: …

So when you perform many backups, especially transaction log backups, and/or have many databases on the same instance, the SQL Server error log could contain hundreds or thousands of these messages in a way that it can become difficult to find any other useful information there. When this trace flag is used, backup messages are no longer written to the error log or the system event log.

Although this trace flag had been available since SQL Server 2000, most of us learned about it until 2007 when both Andy Kelly and Kevin Farlee blogged about it. At the time it was undocumented but it is now totally documented, appearing on the Trace Flags entry of Books Online.

Trace flags can be set on and off by using the DBCC TRACEON and DBCC TRACEOFF commands or by using the -T startup option, although the latest choice is more appropriate for this specific case. One way to use –T startup option is by right-clicking on the SQL Server service using Configuration Manager, selecting Properties and the Advanced tab, and adding ;-T3226 to the Startup Parameters entry as shown in the next figure. Finally, you will be required to restart your SQL Server service for this configuration change to take effect.

clip_image002

Presenting at the PASS Summit 2010

I am honored to be selected to present at the PASS Summit for the third time. This November in Seattle I will be presenting the following two sessions:

Top 10 Query Optimizer Topics for Better Query Performance

This session will show you how a better understanding on how the Query Optimizer works can help you to improve the performance of your queries. I will show you the top 10 Query Optimizer topics that can give you the more benefit by focusing both on the concepts and practical solutions. The SQL Server Query Optimizer is a cost-based optimizer which job is to analyze the possible execution plans for a query, estimate the cost of these plans and select the one with the lowest cost. So a better knowledge on how the Query Optimizer works can help both database developers and administrators to get better performance from their databases. Several areas of the query processor will be covered, everything from troubleshooting query performance problems and identifying what information the Query Optimizer needs to do a better job to the extreme cases where, because of the its limitations, the Query Optimizer may not give you a good plan and you may need to take a different approach.

Inside the SQL Server 2008 Data Collector

The SQL Server 2008 Data Collector provides some low overhead data collection functionality to store performance and diagnostics historic information of your SQL Server instances. See how you can use this information to troubleshoot problems and to provide trend analysis for the performance of your SQL Server instance. In addition to show the basics and architecture of the new Data Collector, this session focuses on the predefined system data collection sets that are provided by SQL Server 2008 that automatically collect data from the disk usage, instance activity, and queries statistics. You will learn about the Disk Usage collection set, which gathers statistics regarding the growth of the data and transaction log database files; explore the Server Activity collection set which focus on the server activity and resources utilization; and learn about the Query Statistics collection set which collects data regarding the queries running in your instance.

See you in Seattle!

clip_image002

SQL Server Printed Documentation

We all use Books Online these days when we need to look at the SQL Server documentation. But does anybody remember using any SQL Server printed documentation?

I was introduced to SQL Server back in February 22, 1999. Yes, I know the exact date because it was my first day attending the Microsoft training “System Administration for SQL Server 6.5” and I still have the diploma. SQL Server 7.0 was already out but somehow the offered training was not yet updated. A few years later when I was already working with SQL Server 2000, I found the printed documentation set from the SQL Server 6.0 days but I never really used it.

I am not sure which SQL Server version was the last one to include the printed documentation. Was it SQL Server 7.0? Perhaps somebody reading this can confirm it or comment about it.

I am including a couple of pictures of the books from the SQL Server 6.0 documentation

clip_image002

clip_image004

The documentation even has a poster with the system catalog which shows only 13 system tables for the master database and 17 system tables for user databases.

clip_image006