Benjamin Nevarez Rotating Header Image

Optimizer Statistics on Linked Servers

Recently I was asked to troubleshoot a performance problem with a query using linked servers. The problem was related to a well known issue where the query processor is not able to get the required optimizer statistics from the remote server due to permissions of the user used on the linked server. This behavior is documented on the Books Online entry Guidelines for Using Distributed Queries as shown next:

 “To create the best query plans when you are using a table on a linked server, the query processor must have data distribution statistics from the linked server. Users that have limited permissions on any columns of the table might not have sufficient permissions to obtain all the useful statistics, and might receive a less efficient query plan and experience poor performance. If the linked server is an instance of SQL Server, to obtain all available statistics, the user must own the table or be a member of the sysadmin fixed server role, the db_owner fixed database role, or the db_ddladmin fixed database role on the linked server.”

Basically the problem is that if the user used by the linked server does not have the permissions described in the previous Books Online entry, SQL Server will not be able to execute the DBCC SHOW_STATISTICS statement on the remote server to obtain the required statistics. It is however unfortunate that having access to the data does not also give you access to its statistics as having to provide higher level permissions could be a security concern. Let me show you the problem with an example.

Plan with read-only permissions

I’ve created a linked server between two SQL Server instances each one hosting a copy of AdventureWorks database. First, I grant read-only permissions to the user used by the linked server and run the following artificial query just to demonstrate this behavior.

SELECT l.* FROM AdventureWorks.Sales.SalesOrderHeader l

JOIN remote.AdventureWorks.Sales.SalesOrderHeader r

ON l.SalesOrderID = r.SalesOrderID

WHERE r.CustomerID = 666

Running the previous query gets me the following plan:

clip_image002

In this case the Query Optimizer could benefit of knowing the cardinality estimate of the query executed on the remote server, that is, to know how many orders were placed by customer 666, but this information is not available for this plan. With an estimated guess of 2,362.49 rows the Query Optimizer is deciding to use a Merge Join plus a Clustered Index Scan when in fact the query is returning only 8 records. You can run Profiler against the remote server to learn what kind of information the local query processor is requesting from it. I’ve noticed that it executes the following five system stored procedures which obtain information about the tables, columns, indexes, check constraints and statistics involved on the remote query.

exec [AdventureWorks].[sys].sp_tables_info_90_rowset N’SalesOrderHeader’,N’Sales’,NULL

exec [AdventureWorks].[sys].sp_columns_100_rowset N’SalesOrderHeader’,N’Sales’,NULL

exec [AdventureWorks].[sys].sp_indexes_100_rowset N’SalesOrderHeader’,NULL,N’Sales’

exec [AdventureWorks].[sys].sp_check_constbytable_rowset N’SalesOrderHeader’,N’Sales’,NULL,NULL

exec [AdventureWorks].[sys].sp_table_statistics2_rowset N’SalesOrderHeader’,N’Sales’,N’AdventureWorks’,NULL,NULL,NULL

The last stored procedure, sp_table_statistics2_rowset, tries to obtain the header and density information of the statistics object using the following statement but it fails because of the lack of permissions mentioned earlier.

dbcc show_statistics(@qtbl, @statname) with stat_header join density_vector

Plan with db_owner permissions

Now grant db_owner permissions to the user used on the liked server and run the query again while forcing a new optimization (for example using DBCC FREEPROCCACHE to clear the plan cache on the local server). Profiler will show that the previous five system stored procedures were again executed but this time the sp_table_statistics2_rowset procedure is able to successfully get the requested header and density information. In addition, the following statement is also issued to obtain the histogram information from the statistics object

DBCC SHOW_STATISTICS(N’"AdventureWorks"."Sales"."SalesOrderHeader"’,

"IX_SalesOrderHeader_CustomerID") WITH HISTOGRAM_STEPS

This time the following execution plan is returned

clip_image004

By using the histogram from the remote server, the local query processor is able to get an estimated number of rows of 6.65385 and decides to use a Nested Loops Join and a Clustered Index Seek instead which is a better plan and more appropriate as the actual number of records returned is only 8.

If you manually run the previous DBCC SHOW_STATISTICS statement on the remote server you can see the histogram, an extract of which is shown next, showing the step corresponding for the value for CustomerID 666 which in this case shows an estimated of 6.653846 records on the AVG_RANGE_ROWS column, which is also shown on the previous execution plan.

clip_image006

Remote plan

It is also worth mentioning that the query executed on the remote servers is practically the same in both cases, except that ORDER BY is needed on the first example as the Merge Join on the local plan requires sorted data, but the plan is still the same as shown next (both the query and the plan can be captured using Profiler).

SELECT "Tbl1003"."SalesOrderID" "Col1011"

FROM "AdventureWorks"."Sales"."SalesOrderHeader" "Tbl1003"

WHERE "Tbl1003"."CustomerID"=(666)

ORDER BY "Col1011" ASC

Again notice that the estimated number of rows is 6.65385

clip_image008

Conclusion

So we’ve seen how using a user with limited permissions to run a query through a linked server can in some specific cases be a performance problem because of the lack of access to optimizer statistics. However, it seems inappropriate that having access to the data does not also give you access to its statistics as having to provide higher level permissions could be a security concern. In fact, there is a related connect entry by Erland Sommarskog discussing the problem and suggesting that permissions to access data should also give access to its statistics. So let us hope a better solution to the use of linked servers is provided in the near future.

SQLSaturday 73 Recap

This is yet another late review of SQLSaturday 73 which I had to postpone writing a couple of times before. Same as last year, this SQLSaturday was hosted at the Golden West College on Huntington Beach, California so I was already familiar with the place. Different than last year event, fortunately this time I was able to attend the speaker dinner on Friday night which was hosted at the Acapulco restaurant in Costa Mesa, just a few minutes from Huntington Beach. I had a great time at the dinner meeting event organizers Andrew Karcher (@akarcher) and Thomas Mueller (@tmspecial), some of the local speakers like Denny Cherry (@mrdenny), speakers which I usually only meet at the PASS Summit like Denise McInerney (@denisemc06) and Meredith Ryan-Smith (@coffegrl) and meeting new people like Carlos Bossy (@carlosbossy) who was traveling from Denver, Colorado.

The day of the event I left home very early in the morning as Huntington Beach is about 75 minutes from my place up north at Santa Clarita and the traffic is sometimes unpredictable but fortunately arrived to the event on time for registration and get some coffee. I saw many familiar faces helping as volunteers for the event and I spent some time talking to some of them before heading for the first session of the day. I was planning to attend as many sessions as possible but since I was also presenting on three of them that proved to be very difficult.

The event was scheduled to run six sessions simultaneously and for the first round I briefly attended Clint Kunz’s session “Microsoft Business Intelligence All Up” and later Randy Knight’s “Become Bilingual! Oracle for the SQL Server DBA”.

Then it was time for my first presentation of the day, “Top Query Optimizer Topics for Better Performance”, which was the same session I presented at the PASS Summit 2010 and, at the moment of this writing, I’ve presented a total of nine times (and I’ve just found that I will be doing this session again at SQLSaturday 83 for Johannesburg, South Africa too!). Actually, since I presented some of these sessions on local user groups and code camps I was afraid that the people interested may have seen the session already and decided to attend other sessions instead. Fortunately attendance was really good as you can see on the picture below. I finish my first session by giving away a copy of my book Inside the SQL Server Query Optimizer.

clip_image002

After my presentation I attended the only entire session I would be able to see that day and actually one that I was very interested in. It was Ted Tasker’s “Parallel Data Warehouse – Architecture Overview”, an excellent session describing PDW massively parallel processing (MPP) architecture designed for data warehouses of dozens or hundreds of terabytes.

Sessions during lunch was a new addition to the event this year and during this time I attended part of Bob Abernethy’s “BI: Then and Now” session and later I hang out at the speaker room for a last review of my next presentation, which I was presenting for the first time. The only other lunch session available was “Women in Tech: Encouraging the Next Generation”, hosted by Denise McInerney.

My second presentation was just after lunch and the topic was “Partitioned Tables and Indexes: Management & Query Processing”. As mentioned, this was the first time I was presenting it and I think it went really well, which I was able to confirm by looking at the speaker evaluations later. I finished the session by giving away another copy of my book.

From my session on partitioning I quickly ran to another room for the SQL Server Panel discussion where I was participating too. The Panel was coordinated by Thomas Mueller and included speakers Lynn Langit (@llangit), Ike Ellis (@EllisTeam1), Denny Cherry, Bret Stateham (@BStateham), Andrew Karcher and myself. Topics included discussing our favorite new features of SQL Server Denali; advantages and drawbacks of SQL Azure, features that you would like to see in a future version of SQL Server, favorite third party tools for SQL Server, among others. There were also some topics based on questions from the attendees.

After the SQL Server Panel discussion I went to the speaker’s room again and was able to meet a few more of the speakers including Harold Wong. I also attended part of Denny Cherry’s “Index Internals” session. SQLSaturday concluded with a raffle where attendees got books and prizes donated by sponsors. The after-party was at Michael’s Sports Pub and Grill located just a few minutes away from the Golden West College where I was able to meet a few more people including Mark Ginnebaugh (@markginnebaugh) who was traveling to the event from the San Francisco Bay Area.

In summary this was amazing SQLSaturday and I had a great time there. All I can say is I am ready for the next SQLSaturday here in Southern California.

Columnstore Indexes and other new Optimizations in Denali

One of the most exciting new features that will be available on the next version of SQL Server, code-named Denali, are the columnstore indexes and some related new optimizations for data warehouse queries. Most of us learned about this project, code-named Apollo, at the PASS Summit 2010 last November when it was demonstrated on Tuesday’s keynote and explained on some sessions by Eric Hanson later that day, and by Susan Price the following day. Eric has also written the white paper “Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0” which you can read here. With this new technology Microsoft promises to improve the performance of star join queries, used in data warehousing scenarios, by several orders of magnitude.

As mentioned, project Apollo, consists of the addition of a new type of index, called columnstore, plus the addition of new sophisticated query processing algorithms. Microsoft has released some information regarding how the columnstore indexes work but has indicated that the new optimization and execution algorithms will be kept “top secret” and will “remain a mystery”. The column-based storage approach is not new and has been used before on some other databases, although Microsoft claims SQL Server is the first major database vendor to offer a pure column-based approach. Columnstore indexes are based on Microsoft’s VertiPaq technology which is also used on Analysis Services and PowerPivot for Excel and SharePoint.

Let me briefly explain how the columnstore indexes work. The traditional approach, used by SQL Server and most other databases, is to store rows on a data page, and it is now called a row store. On the other hand, columnstore indexes will instead dedicate an entire page to a column, that is, pages are defined to contain only data for a specific column. For example in the following figure, taken from the previously mentioned white paper, several pages are assigned to contain data for column c1 only, and the same is true for c2, c3, and the other columns.

clip_image002

One of the problems that columnstore indexes are trying to address is that with a row-based storage SQL Server always reads all the columns of the record even when not all of them are needed by the query. In fact, some of the performance benefits of the columnstore index are related to the fact that most queries only use less than 15% of the columns of the table. Not having to read all the columns of a table provides significant savings in disk I/O. An additional benefit of this technology is that the data is heavily compressed again needing fewer disk I/O and at the same time being able to fit more data in memory. Having enough memory will also play an important role on the technology as well. The only thing you need to do to take benefit of this performance improvement is to define a columnstore index; there is no need to change your queries, use any specific syntax or to do anything else.

Same as with previous versions of SQL Server, the query optimizer will have the choice between using a columnstore index or any of the other available access methods and, as always, this will be a cost-based decision. And same as before, you will still have the choice to force any index using a hint if the query optimizer is not giving you a good execution plan. This can happen for example when the query optimizer is choosing a columnstore index when it shouldn’t or when you want to force a columnstore index when it is not being selected. Something which is also new is the concept of an execution mode: the new query processing algorithms mentioned earlier will run in what is called a batch execution mode, which is different from the traditional processing mode, now called row mode. This is a different approach that the query optimizer will have to choose and this information will be available on the execution plans. So I suppose we will need to get used to some new terms from now on: column store and row store from the storage point of view, and batch mode and row mode from the query processing point of view.

One of the limitations at the moment is that data using columnstore indexes will not be updatable (no INSERT, DELETE, UPDATE, MERGE or other update operations will be allowed). Not being able to update data may look like a big disadvantage but this may not be a big problem as the target for this technology are data warehouses which are usually read-only most of the day and may just require updates once a day. A few possible solutions to update data using columnstore indexes are provided and I will refer you to the mentioned white paper for the details. Two additional limitations are that at the moment only a single nonclustered columnstore index can be created per table and that no clustered columnstore indexes are available yet. Both limitations may go away in a future release.

Unfortunately the columnstore indexes and related query processing algorithms were not included on the first and currently only publicly available Denali CTP. Eric Hanson mentioned back in December on his twitter account, @ENH_SQLServer, that this feature would be included on CTP2 although the software was not made publicly available. He mentioned more recently that the columnstore indexes will finally make its public debut on Denali CTP3.

My book “Inside the SQL Server Query Optimizer” Now Available on Amazon

I just found today that my book, “Inside the SQL Server Query Optimizer”, is finally available on Amazon. You can find the description of the book on the Amazon page or a more detailed description chapter by chapter on one of my previous posts here. So I would like to use this post to thank the people who helped me or in some way influenced the content of the book.

Writing this book was a lot of fun, but also a lot of work; actually a lot more work than I originally expected. Fortunately I got help from several people. First of all, I would like to thank Chris Massey. Chris helped me as the technical editor of the book, and guided me through most of the writing process, doing absolutely outstanding work. Very special thanks also go out to Grant Fritchey who helped us as the technical reviewer, and went very much beyond just reviewing, as his comments provided invaluable feedback to improve the quality of this book. Thanks also go to Tony Davis for offering me this opportunity in the first place, and helping to get the project started.

Outside the Red Gate team, my deepest gratitude goes to Cesar Galindo-Legaria, Manager of the SQL Server Query Optimization team at Microsoft, for answering my endless list of questions about the Query Optimizer, and educating me through some of the information he has published, including numerous research papers and an excellent chapter of a SQL Server book. I had the opportunity to meet Tony, Cesar, and Grant at the PASS Summit back in 2009, when I first started this journey, and I hope to finally meet Chris at the same conference in October 2011.

Although Chris, Cesar, Grant and Tony have directly helped me to shape the content of this book, there’s also a host of other people who have indirectly influenced the book through their work, which helped me to learn about and better understand the SQL Server query processor. With that in mind, I would like to thank the authors who have discussed the query processor in some of the available SQL Server books, Microsoft white papers and blogs, including Kalen Delaney, Ken Henderson, Lubor Kollar, Craig Freedman, Conor Cunningham and Eric Hanson.

Research papers have provided me with an unlimited source of information, and helped me to understand the Query Optimizer at a significantly deeper level than ever before. So, thanks to all that amazing work by the many people in the research community including, in addition to Cesar, Goetz Graefe, Surajit Chaudhuri, Yannis Ioannidis, Vivek Narasayya, Pat Selinger, Florian Waas, and many, many more.

Finally, on the personal side, I would like to thank my parents, Guadalupe and Humberto, and my family: my wife Rocio, and my three sons, Diego, Benjamin and David. Thanks all for your unconditional support and patience.

Slide Deck and Demo Files for Recent Presentations

Attached are WinZip files containing the PowerPoint presentation, as well as the scripts used in the demos for my sessions “Top 10 Query Optimizer Topics for Better Performance” and “Inside the SQL Server 2008 Data Collector” which I presented recently at the SoCal Code Camp, the Los Angeles SQL Server Professionals Group and the PASS Summit 2010.

Inside the SQL Server 2008 Data Collector – TheDataCollector.zip
10 Query Optimizer Topics for Better Performance – QueryOptimizerTopics.zip

Presenting at the SoCal Code Camp

I am speaking again this month, this time at the SoCal Code Camp at Cal State Fullerton. SoCal Code Camp is a community driven event where developers come and learn from their peers and it is scheduled for Saturday, January 29th and Sunday, January 30th. I will be presenting two sessions on Saturday: “Inside the SQL Server 2008 Data Collector” at 8:45 am, and “Top 10 SQL Server Query Optimizer Topics for Better Performance” at 4:00 pm, both on room UH-335.

I will also be participating on the SQL Server Q&A session along with Denny Cherry, Lynn Langit, Bret Stateham, Ben Aminnia, Ike Ellis, Andrew Karcher and Thomas Mueller. This session will be hosted on room UH-335 at 2:45 pm.

For more information regarding sessions, schedule and directions visit the SoCal Code Camp website. I hope to see you there,

clip_image002

Speaking at the Los Angeles SQL Server Professionals User Group

I haven’t updated this blog in a long time so I wanted to put in a quick post about a session that I will be presenting at the Los Angeles SQL Server Professionals User Group this Thursday, January 20th. The session, “Top 10 Query Optimizer Topics for Better Performance”, is the same topic I presented a couple of months ago at the PASS Summit in Seattle.

The meeting will be hosted at the UCLA campus and will start at 6:30 PM with Allen Berezovsky who will talk about File Stream in SQL Server. My session will follow next. More details and directions can be found at the Los Angeles SQL Server Professionals Group website.

I hope to see you there,

Benjamin

clip_image002

My Book, “Inside the Query Optimizer”, available at the PASS Summit

My book, “Inside the SQL Server Query Optimizer”, is almost finished and we will have a conference edition of it available at the PASS Summit. The final version of the book, published by Red Gate books, will be available on Amazon by Christmas.

For more details on the contents, I am including the Preface of the book next.

clip_image002

Preface

The Query Optimizer has always been one of my favorite SQL Server topics, which is why I started blogging about it and submitting related presentations to PASS. And so it would have continued, except that, after several blog posts discussing the Query Optimizer, Red Gate invited me to write a book about it. This is that book.

I started learning about the Query Optimizer by reading the very few SQL Server books which discussed the topic, and most of them covered it only very briefly. Yet I pressed on, and later, while trying to learn more about the topic, I found an extremely rich source of information in the form of the many available research papers. It was hard to fully grasp them at the beginning, as academic papers can be difficult to read and understand, but soon I got used to them, and was all the more knowledgeable for it.

Having said that, I feel that I’m in a bit of a minority, and that many people still see the Query Optimizer just as a black box where a query is submitted and an amazing execution plan is returned. It is also seen as a very complex component, and rightly so. It definitely is a very complex component, perhaps the most complex in database management software, but there is still a lot of great information about the Query Optimizer that SQL Server professionals can benefit from.  

The Query Optimizer is the SQL Server component that tries to give you an optimal execution plan for your queries and, just as importantly, tries to find that execution plan as quickly as possible. A better understanding of what the Query Optimizer does behind the scenes can help you to improve the performance of your databases and applications, and this book explains the core concepts behind how the SQL Server Query Optimizer works. With this knowledge, you’ll be able to write better queries, provide the Query Optimizer with the information it needs to produce efficient execution plans, and troubleshoot the cases when the Query Optimizer is not giving you a good plan.

With that in mind, and in case it’s not obvious, the content of this book is intended for SQL Server professionals: database developers and administrators, data architects, and basically anybody who submits more than just trivial queries to SQL Server. Here’s a quick overview of what the book covers:

The first chapter, Introduction to Query Optimization, starts with an overview on how the SQL Server Query Optimizer works and introduces the concepts that will be covered in more detail in the rest of the book. A look into some of the challenges query optimizers still face today is covered next, along with a section on how to read and understand execution plans. The Chapter closes with a discussion of join ordering, traditionally one of the most complex problems in query optimization.

The second chapter talks about the Execution Engine, and describes it as a collection of physical operators that perform the functions of the query processor. It emphasizes how these operations, implemented by the Execution Engine, define the choices available to the Query Optimizer when building execution plans. This Chapter includes sections on data access operations, the concepts of sorting and hashing, aggregations, and joins, to conclude with a brief introduction to parallelism.

Chapter 3, Statistics and Cost Estimation, shows how the quality of the execution plans generated by the Query Optimizer is directly related to the accuracy of its cardinality and cost estimations. The Chapter describes Statistics objects in detail, and includes some sections on how statistics are created and maintained, as well as how they are used by the Query Optimizer. We’ll also take a look at how to detect cardinality estimation errors, which may cause the Query Optimizer to choose inefficient plans, together with some recommendations on how to avoid and fix these problems. Just to round off the subject, the chapter ends with and introduction to cost estimation.

Chapter 4, Index selection, shows how SQL Server can speed up your queries and dramatically improve the performance of your applications just by using the right indexes. The Chapter shows how SQL Server selects indexes, how you can provide better indexes, and how you can verify your execution plans to make sure these indexes are correctly used. We’ll talk about the Database Engine Tuning Advisor and the Missing Indexes feature, which will show how the Query Optimizer itself can provide you with index tuning recommendations.

Chapter 5, The Optimization Process, is the Chapter that goes right into the internals of the Query Optimizer and introduces the steps that it performs without you ever knowing. This covers everything from the moment a query is submitted to SQL Server until an execution plan is generated and is ready to be executed, including steps like parsing, binding, simplification, trivial plan and full optimization. Important components which are part of the Query Optimizer architecture, such as transformation rules and the memo structure, are also introduced.

Chapter 6, Additional Topics, includes a variety of subjects, starting with the basics of update operations and how they also need to be optimized just like any other query, so that they can be performed as quickly as possible. We’ll have an introduction to Data Warehousing and how SQL Server optimizes star queries, before launching into a detailed explanation of Parameter sniffing, along with some recommendations on how to avoid some problems presented by this behavior. Continuing with the topic of parameters, the Chapter concludes by discussing auto-parameterization and forced parameterization.

Chapter 7 describes Hints, and warns that, although hints are a powerful tool which allows you to take explicit control over the execution plan of a query, they need to be used with caution and only as a last resort when no other option is available. The chapter covers the most-used hints, and ends with a couple of sections on plan guides and the USE PLAN query hint.

Before we get started, please bear in mind that this book contains many undocumented SQL Server statements. These statements are provided only as a way to explore and understand the Query Optimizer and, as such, should not be used on a production environment. Use them wisely, and I hope you enjoy learning about this topic as much as I do.

Benjamin Nevarez

Presenting at the SoCal Rock & Roll Code Camp

I will be presenting two sessions at the SoCal Rock & Roll Code Camp this Saturday. This is a community driven event with over 100 sessions, hosted at the University of Southern California (USC) on both Saturday October 23rd and Sunday 24th. My sessions will be “Inside the SQL Server 2008 Data Collector” at 12:15 pm, and “Top 10 SQL Server Query Optimizer Topics for Better Performance” at 1:30 pm, both on room VKC-105.

For more information regarding sessions, schedule and directions visit the SoCal Rock & Roll Code Camp website.

clip_image002

Speaking at the Orange County SQL Server Professionals User Group

I will be speaking at the Orange County SQL Server Professionals User Group this Thursday, October 7th, 2010. The topic is “Top 10 Query Optimizer Topics for Better Performance”. So if you are in the Orange County or Los Angeles area please stop by and say hello. 

The meeting starts at 6:30 PM. More details and directions can be found here 

Orange County SQL Server Professionals User Group

http://www.sqloc.com