Benjamin Nevarez Rotating Header Image

Database Engine Tuning Advisor and the Query Optimizer

Since I am planning to publish a couple of DTA-related posts later this week I thought it would be a good idea to start with this article which explains how the DTA relies on the Query Optimizer to make its tuning recommendations. This article was previously published in my book Inside the SQL Server Query Optimizer.
 
Currently, all major commercial database vendors include a physical database design tool to help with the creation of indexes. However, when these tools were first developed, there were just two main architectural approaches considered for how these tools should recommend indexes. The first approach was to build a stand-alone tool with its own cost model and design rules. The second approach was to build a tool that could use the Query Optimizer cost model.

A problem with building a stand-alone tool is the requirement for duplicating the cost module. On top of that, having a tool with its own cost model, even if it’s better than the optimizer’s cost model, may not be a good idea because the optimizer still chooses its plan based on its own model.

The second approach, using the Query Optimizer to help on physical database design, was proposed in the database research community as far as back as 1988. Since it’s the optimizer which chooses the indexes for an execution plan, it makes sense to use the optimizer itself to help find which missing indexes would benefit existing queries. In this scenario, the physical design tool would use the optimizer to evaluate the cost of queries given a set of candidate indexes. An additional benefit of this approach is that, as the optimizer cost model evolves, any tool using its cost model can automatically benefit from it.

SQL Server was the first commercial database product to include a physical design tool, in the shape of the Index Tuning Wizard which shipped with SQL Server 7.0, and which was later replaced by the Database Engine Tuning Advisor (DTA) in SQL Server 2005. Both tools use the Query Optimizer cost model approach and were created as part of the AutoAdmin project at Microsoft, the goal of which was to reduce the total cost of ownership (TCO) of databases by making them self-tuning and self-managing. In addition to indexes, the DTA can help with the creation of indexed views and table partitioning.

However, creating real indexes in a DTA tuning session is not feasible; its overhead could impact operational queries and degrade the performance of your database. So how does the DTA estimate the cost of using an index that does not yet exist? Actually, even during a regular query optimization, the Query Optimizer does not use indexes to estimate the cost of a query. The decision of whether to use an index or not relies only on some metadata and the statistical information regarding the columns of the index. Index data itself is not needed during query optimization but will, of course, be required during query execution if the index is chosen for the execution plan.

So, to avoid creating indexes during a DTA session, SQL Server uses a special kind of indexes called hypothetical indexes, which were also used by the Index Tuning Wizard. As the name implies, hypothetical indexes are not real indexes; they only contain statistics and can be created with the undocumented WITH STATISTICS_ONLY option of the CREATE INDEX statement. You may not be able to see these indexes during a DTA session because they are dropped automatically when they are no longer needed. However, you could see the CREATE INDEX WITH STATISTICS_ONLY and DROP INDEX statements if you run a SQL Server Profiler session to see what the DTA is doing.

Let’s take a quick tour to some of these concepts. To get started, create a new table on the AdventureWorks database:

SELECT *
INTO dbo.SalesOrderDetail
FROM Sales.SalesOrderDetail

Copy the following query and save it to a file:

SELECT * FROM dbo.SalesOrderDetail
WHERE ProductID = 897

Open a new DTA session, and you can optionally run a SQL Server Profiler session if you want to inspect what the DTA is doing. On the Workload File option, select the file containing the SQL statement that you just created and specify AdventureWorks as both the database to tune and the database for workload analysis. Click the Start Analysis button and, when the DTA analysis finishes, run this query to inspect the contents of the msdb..DTA_reports_query table:

SELECT * FROM msdb..DTA_reports_query

Running that query shows the following output, (edited for space):

StatementString                             CurrentCost RecommendedCost
SELECT * FROM dbo.SalesOrderDetail WHERE Pr 1.2434      0.00328799

Notice that the query returns information like the query that was tuned, as well as the current and recommended cost. The current cost, 1.2434, is easy to obtain by directly requesting an estimated execution plan for the query as shown next.

clip_image002

Since the DTA analysis was completed, the created hypothetical indexes were already dropped. To now obtain the indexes recommended by the DTA, click on the Recommendations tab and look at the Index Recommendations section, where you can find the code to create any recommended index by then clicking on the Definition column. In our example, it will show the following code:

CREATE CLUSTERED INDEX [_dta_index_SalesOrderDetail_c_5_1915153868__K5]
ON [dbo].[SalesOrderDetail]
(
    [ProductID] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,  DROP_EXISTING = OFF,
ONLINE = OFF) ON [PRIMARY]

In the next statement, and for demonstration purposes only, I will go ahead and create the index recommended by the DTA but, instead of a regular index, I will create it as a hypothetical index by adding the WITH STATISTICS_ONLY clause:

CREATE CLUSTERED INDEX cix_ProductID ON dbo.SalesOrderDetail(ProductID)
WITH STATISTICS_ONLY

You can validate that a hypothetical index was created by running the next query:

SELECT * FROM sys.indexes
WHERE object_id = object_id('dbo.SalesOrderDetail')
AND name = 'cix_ProductID'

The output is shown next below; note that the is_hypothetical field shows that this is, in fact, just a hypothetical index:

object_id  name          index_id type type_desc is_hypothetical
1915153868 cix_ProductID 3        1    CLUSTERED 1

Remove the hypothetical index by running this statement:

DROP INDEX dbo.SalesOrderDetail.cix_ProductID

Now implement the DTA recommendation, this time as a regular clustered index:

CREATE CLUSTERED INDEX cix_ProductID ON dbo.SalesOrderDetail(ProductID)

After implementing the recommendation and running the query again, the clustered index is in fact now being chosen by the Query Optimizer. This time, the plan shows a Clustered Index Seek operator and an estimated cost of 0.0033652, which is very close to the recommended cost listed previously when querying the msdb..DTA_reports_query table.

Finally, drop the table you just created by running the following statement:

DROP TABLE dbo.SalesOrderDetail

Optimizer Statistics on Linked Servers

Recently I was asked to troubleshoot a performance problem with a query using linked servers. The problem was related to a well known issue where the query processor is not able to get the required optimizer statistics from the remote server due to permissions of the user used on the linked server. This behavior is documented on the Books Online entry Guidelines for Using Distributed Queries as shown next:

 “To create the best query plans when you are using a table on a linked server, the query processor must have data distribution statistics from the linked server. Users that have limited permissions on any columns of the table might not have sufficient permissions to obtain all the useful statistics, and might receive a less efficient query plan and experience poor performance. If the linked server is an instance of SQL Server, to obtain all available statistics, the user must own the table or be a member of the sysadmin fixed server role, the db_owner fixed database role, or the db_ddladmin fixed database role on the linked server.”

Basically the problem is that if the user used by the linked server does not have the permissions described in the previous Books Online entry, SQL Server will not be able to execute the DBCC SHOW_STATISTICS statement on the remote server to obtain the required statistics. It is however unfortunate that having access to the data does not also give you access to its statistics as having to provide higher level permissions could be a security concern. Let me show you the problem with an example.

Plan with read-only permissions

I’ve created a linked server between two SQL Server instances each one hosting a copy of AdventureWorks database. First, I grant read-only permissions to the user used by the linked server and run the following artificial query just to demonstrate this behavior.

SELECT l.* FROM AdventureWorks.Sales.SalesOrderHeader l

JOIN remote.AdventureWorks.Sales.SalesOrderHeader r

ON l.SalesOrderID = r.SalesOrderID

WHERE r.CustomerID = 666

Running the previous query gets me the following plan:

clip_image002

In this case the Query Optimizer could benefit of knowing the cardinality estimate of the query executed on the remote server, that is, to know how many orders were placed by customer 666, but this information is not available for this plan. With an estimated guess of 2,362.49 rows the Query Optimizer is deciding to use a Merge Join plus a Clustered Index Scan when in fact the query is returning only 8 records. You can run Profiler against the remote server to learn what kind of information the local query processor is requesting from it. I’ve noticed that it executes the following five system stored procedures which obtain information about the tables, columns, indexes, check constraints and statistics involved on the remote query.

exec [AdventureWorks].[sys].sp_tables_info_90_rowset N’SalesOrderHeader’,N’Sales’,NULL

exec [AdventureWorks].[sys].sp_columns_100_rowset N’SalesOrderHeader’,N’Sales’,NULL

exec [AdventureWorks].[sys].sp_indexes_100_rowset N’SalesOrderHeader’,NULL,N’Sales’

exec [AdventureWorks].[sys].sp_check_constbytable_rowset N’SalesOrderHeader’,N’Sales’,NULL,NULL

exec [AdventureWorks].[sys].sp_table_statistics2_rowset N’SalesOrderHeader’,N’Sales’,N’AdventureWorks’,NULL,NULL,NULL

The last stored procedure, sp_table_statistics2_rowset, tries to obtain the header and density information of the statistics object using the following statement but it fails because of the lack of permissions mentioned earlier.

dbcc show_statistics(@qtbl, @statname) with stat_header join density_vector

Plan with db_owner permissions

Now grant db_owner permissions to the user used on the liked server and run the query again while forcing a new optimization (for example using DBCC FREEPROCCACHE to clear the plan cache on the local server). Profiler will show that the previous five system stored procedures were again executed but this time the sp_table_statistics2_rowset procedure is able to successfully get the requested header and density information. In addition, the following statement is also issued to obtain the histogram information from the statistics object

DBCC SHOW_STATISTICS(N’"AdventureWorks"."Sales"."SalesOrderHeader"’,

"IX_SalesOrderHeader_CustomerID") WITH HISTOGRAM_STEPS

This time the following execution plan is returned

clip_image004

By using the histogram from the remote server, the local query processor is able to get an estimated number of rows of 6.65385 and decides to use a Nested Loops Join and a Clustered Index Seek instead which is a better plan and more appropriate as the actual number of records returned is only 8.

If you manually run the previous DBCC SHOW_STATISTICS statement on the remote server you can see the histogram, an extract of which is shown next, showing the step corresponding for the value for CustomerID 666 which in this case shows an estimated of 6.653846 records on the AVG_RANGE_ROWS column, which is also shown on the previous execution plan.

clip_image006

Remote plan

It is also worth mentioning that the query executed on the remote servers is practically the same in both cases, except that ORDER BY is needed on the first example as the Merge Join on the local plan requires sorted data, but the plan is still the same as shown next (both the query and the plan can be captured using Profiler).

SELECT "Tbl1003"."SalesOrderID" "Col1011"

FROM "AdventureWorks"."Sales"."SalesOrderHeader" "Tbl1003"

WHERE "Tbl1003"."CustomerID"=(666)

ORDER BY "Col1011" ASC

Again notice that the estimated number of rows is 6.65385

clip_image008

Conclusion

So we’ve seen how using a user with limited permissions to run a query through a linked server can in some specific cases be a performance problem because of the lack of access to optimizer statistics. However, it seems inappropriate that having access to the data does not also give you access to its statistics as having to provide higher level permissions could be a security concern. In fact, there is a related connect entry by Erland Sommarskog discussing the problem and suggesting that permissions to access data should also give access to its statistics. So let us hope a better solution to the use of linked servers is provided in the near future.

SQLSaturday 73 Recap

This is yet another late review of SQLSaturday 73 which I had to postpone writing a couple of times before. Same as last year, this SQLSaturday was hosted at the Golden West College on Huntington Beach, California so I was already familiar with the place. Different than last year event, fortunately this time I was able to attend the speaker dinner on Friday night which was hosted at the Acapulco restaurant in Costa Mesa, just a few minutes from Huntington Beach. I had a great time at the dinner meeting event organizers Andrew Karcher (@akarcher) and Thomas Mueller (@tmspecial), some of the local speakers like Denny Cherry (@mrdenny), speakers which I usually only meet at the PASS Summit like Denise McInerney (@denisemc06) and Meredith Ryan-Smith (@coffegrl) and meeting new people like Carlos Bossy (@carlosbossy) who was traveling from Denver, Colorado.

The day of the event I left home very early in the morning as Huntington Beach is about 75 minutes from my place up north at Santa Clarita and the traffic is sometimes unpredictable but fortunately arrived to the event on time for registration and get some coffee. I saw many familiar faces helping as volunteers for the event and I spent some time talking to some of them before heading for the first session of the day. I was planning to attend as many sessions as possible but since I was also presenting on three of them that proved to be very difficult.

The event was scheduled to run six sessions simultaneously and for the first round I briefly attended Clint Kunz’s session “Microsoft Business Intelligence All Up” and later Randy Knight’s “Become Bilingual! Oracle for the SQL Server DBA”.

Then it was time for my first presentation of the day, “Top Query Optimizer Topics for Better Performance”, which was the same session I presented at the PASS Summit 2010 and, at the moment of this writing, I’ve presented a total of nine times (and I’ve just found that I will be doing this session again at SQLSaturday 83 for Johannesburg, South Africa too!). Actually, since I presented some of these sessions on local user groups and code camps I was afraid that the people interested may have seen the session already and decided to attend other sessions instead. Fortunately attendance was really good as you can see on the picture below. I finish my first session by giving away a copy of my book Inside the SQL Server Query Optimizer.

clip_image002

After my presentation I attended the only entire session I would be able to see that day and actually one that I was very interested in. It was Ted Tasker’s “Parallel Data Warehouse – Architecture Overview”, an excellent session describing PDW massively parallel processing (MPP) architecture designed for data warehouses of dozens or hundreds of terabytes.

Sessions during lunch was a new addition to the event this year and during this time I attended part of Bob Abernethy’s “BI: Then and Now” session and later I hang out at the speaker room for a last review of my next presentation, which I was presenting for the first time. The only other lunch session available was “Women in Tech: Encouraging the Next Generation”, hosted by Denise McInerney.

My second presentation was just after lunch and the topic was “Partitioned Tables and Indexes: Management & Query Processing”. As mentioned, this was the first time I was presenting it and I think it went really well, which I was able to confirm by looking at the speaker evaluations later. I finished the session by giving away another copy of my book.

From my session on partitioning I quickly ran to another room for the SQL Server Panel discussion where I was participating too. The Panel was coordinated by Thomas Mueller and included speakers Lynn Langit (@llangit), Ike Ellis (@EllisTeam1), Denny Cherry, Bret Stateham (@BStateham), Andrew Karcher and myself. Topics included discussing our favorite new features of SQL Server Denali; advantages and drawbacks of SQL Azure, features that you would like to see in a future version of SQL Server, favorite third party tools for SQL Server, among others. There were also some topics based on questions from the attendees.

After the SQL Server Panel discussion I went to the speaker’s room again and was able to meet a few more of the speakers including Harold Wong. I also attended part of Denny Cherry’s “Index Internals” session. SQLSaturday concluded with a raffle where attendees got books and prizes donated by sponsors. The after-party was at Michael’s Sports Pub and Grill located just a few minutes away from the Golden West College where I was able to meet a few more people including Mark Ginnebaugh (@markginnebaugh) who was traveling to the event from the San Francisco Bay Area.

In summary this was amazing SQLSaturday and I had a great time there. All I can say is I am ready for the next SQLSaturday here in Southern California.

Columnstore Indexes and other new Optimizations in Denali

One of the most exciting new features that will be available on the next version of SQL Server, code-named Denali, are the columnstore indexes and some related new optimizations for data warehouse queries. Most of us learned about this project, code-named Apollo, at the PASS Summit 2010 last November when it was demonstrated on Tuesday’s keynote and explained on some sessions by Eric Hanson later that day, and by Susan Price the following day. Eric has also written the white paper “Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0” which you can read here. With this new technology Microsoft promises to improve the performance of star join queries, used in data warehousing scenarios, by several orders of magnitude.

As mentioned, project Apollo, consists of the addition of a new type of index, called columnstore, plus the addition of new sophisticated query processing algorithms. Microsoft has released some information regarding how the columnstore indexes work but has indicated that the new optimization and execution algorithms will be kept “top secret” and will “remain a mystery”. The column-based storage approach is not new and has been used before on some other databases, although Microsoft claims SQL Server is the first major database vendor to offer a pure column-based approach. Columnstore indexes are based on Microsoft’s VertiPaq technology which is also used on Analysis Services and PowerPivot for Excel and SharePoint.

Let me briefly explain how the columnstore indexes work. The traditional approach, used by SQL Server and most other databases, is to store rows on a data page, and it is now called a row store. On the other hand, columnstore indexes will instead dedicate an entire page to a column, that is, pages are defined to contain only data for a specific column. For example in the following figure, taken from the previously mentioned white paper, several pages are assigned to contain data for column c1 only, and the same is true for c2, c3, and the other columns.

clip_image002

One of the problems that columnstore indexes are trying to address is that with a row-based storage SQL Server always reads all the columns of the record even when not all of them are needed by the query. In fact, some of the performance benefits of the columnstore index are related to the fact that most queries only use less than 15% of the columns of the table. Not having to read all the columns of a table provides significant savings in disk I/O. An additional benefit of this technology is that the data is heavily compressed again needing fewer disk I/O and at the same time being able to fit more data in memory. Having enough memory will also play an important role on the technology as well. The only thing you need to do to take benefit of this performance improvement is to define a columnstore index; there is no need to change your queries, use any specific syntax or to do anything else.

Same as with previous versions of SQL Server, the query optimizer will have the choice between using a columnstore index or any of the other available access methods and, as always, this will be a cost-based decision. And same as before, you will still have the choice to force any index using a hint if the query optimizer is not giving you a good execution plan. This can happen for example when the query optimizer is choosing a columnstore index when it shouldn’t or when you want to force a columnstore index when it is not being selected. Something which is also new is the concept of an execution mode: the new query processing algorithms mentioned earlier will run in what is called a batch execution mode, which is different from the traditional processing mode, now called row mode. This is a different approach that the query optimizer will have to choose and this information will be available on the execution plans. So I suppose we will need to get used to some new terms from now on: column store and row store from the storage point of view, and batch mode and row mode from the query processing point of view.

One of the limitations at the moment is that data using columnstore indexes will not be updatable (no INSERT, DELETE, UPDATE, MERGE or other update operations will be allowed). Not being able to update data may look like a big disadvantage but this may not be a big problem as the target for this technology are data warehouses which are usually read-only most of the day and may just require updates once a day. A few possible solutions to update data using columnstore indexes are provided and I will refer you to the mentioned white paper for the details. Two additional limitations are that at the moment only a single nonclustered columnstore index can be created per table and that no clustered columnstore indexes are available yet. Both limitations may go away in a future release.

Unfortunately the columnstore indexes and related query processing algorithms were not included on the first and currently only publicly available Denali CTP. Eric Hanson mentioned back in December on his twitter account, @ENH_SQLServer, that this feature would be included on CTP2 although the software was not made publicly available. He mentioned more recently that the columnstore indexes will finally make its public debut on Denali CTP3.

My book “Inside the SQL Server Query Optimizer” Now Available on Amazon

I just found today that my book, “Inside the SQL Server Query Optimizer”, is finally available on Amazon. You can find the description of the book on the Amazon page or a more detailed description chapter by chapter on one of my previous posts here. So I would like to use this post to thank the people who helped me or in some way influenced the content of the book.

Writing this book was a lot of fun, but also a lot of work; actually a lot more work than I originally expected. Fortunately I got help from several people. First of all, I would like to thank Chris Massey. Chris helped me as the technical editor of the book, and guided me through most of the writing process, doing absolutely outstanding work. Very special thanks also go out to Grant Fritchey who helped us as the technical reviewer, and went very much beyond just reviewing, as his comments provided invaluable feedback to improve the quality of this book. Thanks also go to Tony Davis for offering me this opportunity in the first place, and helping to get the project started.

Outside the Red Gate team, my deepest gratitude goes to Cesar Galindo-Legaria, Manager of the SQL Server Query Optimization team at Microsoft, for answering my endless list of questions about the Query Optimizer, and educating me through some of the information he has published, including numerous research papers and an excellent chapter of a SQL Server book. I had the opportunity to meet Tony, Cesar, and Grant at the PASS Summit back in 2009, when I first started this journey, and I hope to finally meet Chris at the same conference in October 2011.

Although Chris, Cesar, Grant and Tony have directly helped me to shape the content of this book, there’s also a host of other people who have indirectly influenced the book through their work, which helped me to learn about and better understand the SQL Server query processor. With that in mind, I would like to thank the authors who have discussed the query processor in some of the available SQL Server books, Microsoft white papers and blogs, including Kalen Delaney, Ken Henderson, Lubor Kollar, Craig Freedman, Conor Cunningham and Eric Hanson.

Research papers have provided me with an unlimited source of information, and helped me to understand the Query Optimizer at a significantly deeper level than ever before. So, thanks to all that amazing work by the many people in the research community including, in addition to Cesar, Goetz Graefe, Surajit Chaudhuri, Yannis Ioannidis, Vivek Narasayya, Pat Selinger, Florian Waas, and many, many more.

Finally, on the personal side, I would like to thank my parents, Guadalupe and Humberto, and my family: my wife Rocio, and my three sons, Diego, Benjamin and David. Thanks all for your unconditional support and patience.

Slide Deck and Demo Files for Recent Presentations

Attached are WinZip files containing the PowerPoint presentation, as well as the scripts used in the demos for my sessions “Top 10 Query Optimizer Topics for Better Performance” and “Inside the SQL Server 2008 Data Collector” which I presented recently at the SoCal Code Camp, the Los Angeles SQL Server Professionals Group and the PASS Summit 2010.

Inside the SQL Server 2008 Data Collector - TheDataCollector.zip
10 Query Optimizer Topics for Better Performance - QueryOptimizerTopics.zip

Presenting at the SoCal Code Camp

I am speaking again this month, this time at the SoCal Code Camp at Cal State Fullerton. SoCal Code Camp is a community driven event where developers come and learn from their peers and it is scheduled for Saturday, January 29th and Sunday, January 30th. I will be presenting two sessions on Saturday: “Inside the SQL Server 2008 Data Collector” at 8:45 am, and “Top 10 SQL Server Query Optimizer Topics for Better Performance” at 4:00 pm, both on room UH-335.

I will also be participating on the SQL Server Q&A session along with Denny Cherry, Lynn Langit, Bret Stateham, Ben Aminnia, Ike Ellis, Andrew Karcher and Thomas Mueller. This session will be hosted on room UH-335 at 2:45 pm.

For more information regarding sessions, schedule and directions visit the SoCal Code Camp website. I hope to see you there,

clip_image002

Speaking at the Los Angeles SQL Server Professionals User Group

I haven’t updated this blog in a long time so I wanted to put in a quick post about a session that I will be presenting at the Los Angeles SQL Server Professionals User Group this Thursday, January 20th. The session, “Top 10 Query Optimizer Topics for Better Performance”, is the same topic I presented a couple of months ago at the PASS Summit in Seattle.

The meeting will be hosted at the UCLA campus and will start at 6:30 PM with Allen Berezovsky who will talk about File Stream in SQL Server. My session will follow next. More details and directions can be found at the Los Angeles SQL Server Professionals Group website.

I hope to see you there,

Benjamin

clip_image002

My Book, “Inside the Query Optimizer”, available at the PASS Summit

My book, “Inside the SQL Server Query Optimizer”, is almost finished and we will have a conference edition of it available at the PASS Summit. The final version of the book, published by Red Gate books, will be available on Amazon by Christmas.

For more details on the contents, I am including the Preface of the book next.

clip_image002

Preface

The Query Optimizer has always been one of my favorite SQL Server topics, which is why I started blogging about it and submitting related presentations to PASS. And so it would have continued, except that, after several blog posts discussing the Query Optimizer, Red Gate invited me to write a book about it. This is that book.

I started learning about the Query Optimizer by reading the very few SQL Server books which discussed the topic, and most of them covered it only very briefly. Yet I pressed on, and later, while trying to learn more about the topic, I found an extremely rich source of information in the form of the many available research papers. It was hard to fully grasp them at the beginning, as academic papers can be difficult to read and understand, but soon I got used to them, and was all the more knowledgeable for it.

Having said that, I feel that I’m in a bit of a minority, and that many people still see the Query Optimizer just as a black box where a query is submitted and an amazing execution plan is returned. It is also seen as a very complex component, and rightly so. It definitely is a very complex component, perhaps the most complex in database management software, but there is still a lot of great information about the Query Optimizer that SQL Server professionals can benefit from.  

The Query Optimizer is the SQL Server component that tries to give you an optimal execution plan for your queries and, just as importantly, tries to find that execution plan as quickly as possible. A better understanding of what the Query Optimizer does behind the scenes can help you to improve the performance of your databases and applications, and this book explains the core concepts behind how the SQL Server Query Optimizer works. With this knowledge, you’ll be able to write better queries, provide the Query Optimizer with the information it needs to produce efficient execution plans, and troubleshoot the cases when the Query Optimizer is not giving you a good plan.

With that in mind, and in case it’s not obvious, the content of this book is intended for SQL Server professionals: database developers and administrators, data architects, and basically anybody who submits more than just trivial queries to SQL Server. Here’s a quick overview of what the book covers:

The first chapter, Introduction to Query Optimization, starts with an overview on how the SQL Server Query Optimizer works and introduces the concepts that will be covered in more detail in the rest of the book. A look into some of the challenges query optimizers still face today is covered next, along with a section on how to read and understand execution plans. The Chapter closes with a discussion of join ordering, traditionally one of the most complex problems in query optimization.

The second chapter talks about the Execution Engine, and describes it as a collection of physical operators that perform the functions of the query processor. It emphasizes how these operations, implemented by the Execution Engine, define the choices available to the Query Optimizer when building execution plans. This Chapter includes sections on data access operations, the concepts of sorting and hashing, aggregations, and joins, to conclude with a brief introduction to parallelism.

Chapter 3, Statistics and Cost Estimation, shows how the quality of the execution plans generated by the Query Optimizer is directly related to the accuracy of its cardinality and cost estimations. The Chapter describes Statistics objects in detail, and includes some sections on how statistics are created and maintained, as well as how they are used by the Query Optimizer. We’ll also take a look at how to detect cardinality estimation errors, which may cause the Query Optimizer to choose inefficient plans, together with some recommendations on how to avoid and fix these problems. Just to round off the subject, the chapter ends with and introduction to cost estimation.

Chapter 4, Index selection, shows how SQL Server can speed up your queries and dramatically improve the performance of your applications just by using the right indexes. The Chapter shows how SQL Server selects indexes, how you can provide better indexes, and how you can verify your execution plans to make sure these indexes are correctly used. We’ll talk about the Database Engine Tuning Advisor and the Missing Indexes feature, which will show how the Query Optimizer itself can provide you with index tuning recommendations.

Chapter 5, The Optimization Process, is the Chapter that goes right into the internals of the Query Optimizer and introduces the steps that it performs without you ever knowing. This covers everything from the moment a query is submitted to SQL Server until an execution plan is generated and is ready to be executed, including steps like parsing, binding, simplification, trivial plan and full optimization. Important components which are part of the Query Optimizer architecture, such as transformation rules and the memo structure, are also introduced.

Chapter 6, Additional Topics, includes a variety of subjects, starting with the basics of update operations and how they also need to be optimized just like any other query, so that they can be performed as quickly as possible. We’ll have an introduction to Data Warehousing and how SQL Server optimizes star queries, before launching into a detailed explanation of Parameter sniffing, along with some recommendations on how to avoid some problems presented by this behavior. Continuing with the topic of parameters, the Chapter concludes by discussing auto-parameterization and forced parameterization.

Chapter 7 describes Hints, and warns that, although hints are a powerful tool which allows you to take explicit control over the execution plan of a query, they need to be used with caution and only as a last resort when no other option is available. The chapter covers the most-used hints, and ends with a couple of sections on plan guides and the USE PLAN query hint.

Before we get started, please bear in mind that this book contains many undocumented SQL Server statements. These statements are provided only as a way to explore and understand the Query Optimizer and, as such, should not be used on a production environment. Use them wisely, and I hope you enjoy learning about this topic as much as I do.

Benjamin Nevarez

Presenting at the SoCal Rock & Roll Code Camp

I will be presenting two sessions at the SoCal Rock & Roll Code Camp this Saturday. This is a community driven event with over 100 sessions, hosted at the University of Southern California (USC) on both Saturday October 23rd and Sunday 24th. My sessions will be “Inside the SQL Server 2008 Data Collector” at 12:15 pm, and “Top 10 SQL Server Query Optimizer Topics for Better Performance” at 1:30 pm, both on room VKC-105.

For more information regarding sessions, schedule and directions visit the SoCal Rock & Roll Code Camp website.

clip_image002