Benjamin Nevarez Rotating Header Image

Rebuilding Indexes vs. Updating Statistics

One of the questions I was asked recently while speaking at the Los Angeles SQL Server Professionals Group and the Orange County SQL Server User Group (*) was regarding the order in which jobs like rebuilding indexes or updating statistics should be executed as part of the database maintenance activities. In general you should consider the following important points, focusing in the fact that there are two kinds of statistics: index and column statistics.

1) By default, the UPDATE STATISTICS statement updates both index and column statistics. Using the COLUMNS option of this statement will update column statistics only. Using the INDEX option will update index statistics only.

2) By default, the UPDATE STATISTICS statement uses only a sample of records of the table. Using UPDATE STATISTICS WITH FULLSCAN will scan the entire table.

3) Rebuilding an index, for example by using the ALTER INDEX … REBUILD statement, will update only index statistics with the equivalent of using WITH FULLSCAN. Rebuilding indexes does not update any column statistics.

4) Reorganizing an index, for example using the ALTER INDEX … REORGANIZE statement, does not update any statistics.

So depending on your maintenance jobs and scripts several scenarios can exist. The simplest scenario is if you want to rebuild all the indexes and update all the statistics. As mentioned before, if you rebuild all your indexes then all the index statistics will also be updated by scanning all the rows on the table. Then you just need to update your column statistics by running an UPDATE STATISTICS WITH FULLSCAN, COLUMNS statement. Since the first job only updates index statistics and the second one only updates column statistics, it does not matter which one you execute first.

Some other more complicated scenarios include when you have jobs which rebuild your indexes based on their fragmentation level. Of course, the worst case scenario would be if you first rebuild your indexes, which also updates the index statistics by scanning the entire table, and later you run UPDATE STATISTICS using the default values, which again updates the index statistics but this time only with a default and smaller sample. Not only you are updating your index statistics twice but you are overwriting the better of the two choices.

Let me show you how these commands work with some examples using the AdventureWorks database. Create a new table dbo.SalesOrderDetail

SELECT * INTO dbo.SalesOrderDetail

FROM sales.SalesOrderDetail

The next query uses the sys.stats catalog view and shows that initially there are no statistics objects for the new table.

SELECT name, auto_created, stats_date(object_id, stats_id) AS update_date FROM sys.stats

WHERE object_id = object_id(‘dbo.SalesOrderDetail’)

Use this query using the sys.stats catalog again when asked to inspect the status of the statistics after each of the following commands. Now run the following query

SELECT * FROM dbo.SalesOrderDetail

WHERE SalesOrderID = 43670 AND OrderQty = 1

Use the previous sys.stats query to verify that two statistics objects were created, one for the SalesOrderID column and another one for the OrderQty column (they both have names starting with _WA_Sys as shown later). Run the following statement to create an index on the ProductID column.

CREATE INDEX ix_ProductID ON dbo.SalesOrderDetail(ProductID)

Run again the query to verify that a new statistics object for the ProductID column has been created. Notice the value of the auto_created column which tells if the statistics were automatically created by the query processor.

name                          auto_created      update_date

_WA_Sys_00000004_7F80E8EA     1                 2010-07-01 23:27:45.953

_WA_Sys_00000001_7F80E8EA     1                 2010-07-01 23:27:46.117

ix_ProductID                  0                 2010-07-01 23:29:41.110

Run the next statement to update the column statistics only. You can validate that only the column statistics were updated by looking at the update_date column which uses the STATS_DATE function to display the last date the statistics were updated.

UPDATE STATISTICS dbo.SalesOrderDetail WITH FULLSCAN, COLUMNS

name                          auto_created      update_date

_WA_Sys_00000004_7F80E8EA     1                 2010-07-01 23:34:13.753

_WA_Sys_00000001_7F80E8EA     1                 2010-07-01 23:34:13.850

ix_ProductID                  0                 2010-07-01 23:29:41.110

This statement will do the same for the index statistics

UPDATE STATISTICS dbo.SalesOrderDetail WITH FULLSCAN, INDEX

The next two statements will update both index and column statistics

UPDATE STATISTICS dbo.SalesOrderDetail WITH FULLSCAN

UPDATE STATISTICS dbo.SalesOrderDetail WITH FULLSCAN, ALL

See how an index rebuild only updates index statistics

ALTER INDEX ix_ProductID ON dbo.SalesOrderDetail REBUILD

Here you can verify that reorganizing an index does not update any statistics

ALTER INDEX ix_ProductID ON dbo.SalesOrderDetail REORGANIZE

Finally, remove the table you have just created

DROP TABLE dbo.SalesOrderDetail

* Updated from an article originally written in October, 2009

About the author

Benjamin Nevarez Benjamin Nevarez is a SQL Server MVP and independent consultant based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of "SQL Server 2014 Query Tuning & Optimization" and "Inside the SQL Server Query Optimizer" and co-author of "SQL Server 2012 Internals". With more than 20 years of experience in relational databases, Benjamin has also been a speaker at many SQL Server conferences, including the PASS Summit, SQL Server Connections and SQLBits. Benjamin’s blog can be found at http://www.benjaminnevarez.com and he can also be reached by e-mail at admin at benjaminnevarez dot com and on twitter at @BenjaminNevarez.

15 Comments

  1. Zach (WIDBA) says:

    Something I have always wondered about, but was not sure how to verify. If you have a multi column index – ProdID, ProdCreateDate,ProdType – stats are created on the first (index stat). If the server has generated a _WA index for ProdCreateDate (done for some other query) – will that be used by the optimizer or have any difference from creating a stat via CREATE STATISTICS….(ProdCreateDate). Particularly if there is a query that utilizes ProdCreateDate and ProdType (but not ProdID)

    Any thoughts?

  2. Benjamin Nevarez says:

    Hi Zach,

    Not sure if I understand your question, but feel free to ask again if this is not what you are asking.

    If you have an index on ProdID, ProdCreateDate and ProdType, you will also have a statistics object with the same columns. But this statistics object can not be used on a predicate using the ProdCreateDate column, so SQL Server has to create a new one. The reason is that the index statistics object contain a histogram only on ProdID, and a density vector only on the following combinations

    ProdID,
    ProdID, ProdCreateDate
    ProdID, ProdCreateDate, ProdType

    You can prove it this way. Create the table and index

    create table test (ProdID int, ProdCreateDate int, ProdType int, a int, b int, c int)
    insert into test values (1, 2, 3, 4, 5, 6)
    create nonclustered index ix_test on test(ProdID, ProdCreateDate, ProdType)

    This query shows that only the statistics object for the index ix_test exist

    select * from sys.stats where object_id = object_id(‘test’)

    Run a query using ProdCreateDate

    select * from test
    where ProdCreateDate = 4

    At this moment a new statistics object was created. Run this again

    select * from sys.stats where object_id = object_id(‘test’)

    The output will show a new statistics object created automatically and starting with _WA_Sys.

    Ben

  3. Zach (WIDBA) says:

    Thanks for the response. Two things.

    Does a statistics created by the optimizer (_WA) vary from one created by a user via CREATE STATISTICS …? I have heard (probably more in the form of an anecdote that a user stat is the better way – but the plans don’t suggest that (at least initially).

    Also, in the case above, if I create a multi column statistic – is that more efficient than two separate statistics (in the case where the 2nd column is still highly selective and the 2nd and 3rd columns are used frequently as a filter.

    CREATE STATISTICS ST_IndexTest_multi ON dbo.IndexTest(prodcreatedate,prodtype)
    SELECT ProdID FROM dbo.IndexTest WHERE ProdCreateDate = ‘2001-01-27′ and PRODTYPE = ‘X’

    Is there a way to see what stats the optimizer is using to determine the “best plan – fast” as Kim Tripp says.

    I see no differences in the plan either way with a 150k of rows loaded (unique ProdCreateDate for each row)

    Probably digging deeper than I need to.

  4. Benjamin Nevarez says:

    Hi Zach,

    Regarding your first question

    “Does a statistics created by the optimizer (_WA) vary from one created by a user via CREATE STATISTICS …? I have heard (probably more in the form of an anecdote that a user stat is the better way – but the plans don’t suggest that (at least initially).”

    Perhaps those comments refer to the size of the sample used to create the statistics object. Since the query processor needs to create or update the statistics while running your query, it uses only a default sample. Using the CREATE STATISTICS statement you can specify any sample size or scan the entire table. However, if you need to use a bigger sample for your statistics, I recommend you to do the following:

    1) Leave the query processor to automatically create any needed statistics (this will use a default sample)

    2) Create a maintenance job, to be executed during your maintenance window, to update all the existing statistics using WITH FULLSCAN.

    See if this works for you.

    Ben

  5. Benjamin Nevarez says:

    Regarding your question

    “If I create a multi column statistic – is that more efficient than two separate statistics?”

    Statistics are used in many different ways. In this specific example, the predicates are using equality operators so SQL Server needs histograms for each column. It will use the histogram of the statistics object you have manually created to cover the ProdCreateDate column and will create a new statistics object for the PRODTYPE column.

    By the way, what is the “best plan – fast” that you mention in your comment?

    Thanks,

    Ben

  6. Zach (WIDBA) says:

    We do maintenance nightly/weekly and stats are part of that.

    “By the way, what is the “best plan – fast” that you mention in your comment?” – I was curious if there was a way to see what statistics were used in the exec plan that was used by the optimizer. (Specifically, why would it not use the compound statistic I created versus two separate statistics – but you answered that in your second post.) In the case above, filtering on the Date and Type fields yielded an index scan for the single nonclustered index that was created with ProdId in the lead position.

    Thanks for the insight – will keep plowing through The packed 2008 Internals book.

  7. Anil Singh says:

    Hi Ben,

    So if have to run a Update stats defualt on each tables as part of my Maintenance

    something like this.
    EXEC sp_MSForEachTable “update statistics ? WITH SAMPLE 40 PERCENT”

    There is no need to Alter Index Rebuild. Am I correct?

  8. Shiv Shankar Khanna says:

    Hello Sir,

    As per suggestions of SQL query optimiser some of our developers have added many index on 1 major table where the only difference is columns sequence i,e A,B,C / C,B,A / B,A,C

    What is the right way to deal this ?

  9. mahesh sharma says:

    Hi,

    Is Statistics update de-fragment indexes in a table.

    thanks in advance..:)

  10. Benjamin Nevarez says:

    Hi Shiv,

    I would do an analysis of the queries I am trying to optimize. You can use tools like the Database Engine Tuning Advisor to help on this. Also, if the indexes were already added to the database you can see how useful they really are by looking at the query plans and the sys.dm_db_index_usage_stats DMV. I cover some of these topics on Chapter 4 of my book available for free at http://www.simple-talk.com/books/sql-books/inside-the-sql-server-query-optimizer.

    Regards,

    Ben

  11. Benjamin Nevarez says:

    Hi Mahesh,

    No, to remove fragmentation you would need to use ALTER INDEX … REBUILD or ALTER INDEX … REORGANIZE. You can find more details at http://technet.microsoft.com/en-us/library/ms188388.aspx.

    Regards,

    Ben

  12. Pedro Oliveira says:

    Hi,

    Doesn’t rebuilding the clustered index update the statistics just like WITH FULLSCAN?!

    Thanks,
    Pedro

  13. [...] for those of you that are not aware of such things here is a blog post from Benjamin Nevarez (blog | @BenjaminNevarez) on the topic. He does a great job of explaining the [...]

  14. Mike Holbrook says:

    Since the distribution does not change after a index report why would I update statistics full scan?

  15. Mike Holbrook says:

    Meant index reorg not report.

Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>