In a previous post I talked about the new columnstore indexes and their related processing algorithms which are available in SQL Server code-named Denali. In this post I will cover the query processing part of the technology in more detail and will show you some examples that you can test on the recently released CTP3 (Community Technology Preview) of the product.
Same as with previous versions of SQL Server, in Denali the query optimizer can choose between the available access methods, which now also include columnstore indexes, and as always, this will be a cost-based decision. A new choice the query optimizer will have to make is the selection of an execution mode. The new query processing algorithms mentioned in my previous post will run in what is called a batch execution mode, which is different from the traditional processing mode, now called row mode.
In the row execution mode operators process data one row at a time. The new batch execution mode process data in batches which is more efficient for large amounts of data, like the workloads present on data warehouse queries. Each operator in an execution plan can use the row execution mode and, when columnstore indexes are available, some operators can also use the batch mode. There is both an estimated and an actual execution mode and this information is displayed on the query execution plan as I will show later. It is also worth mentioning that, although columnstore indexes can speed up the performance of data warehouse queries, they are not a good choice for very selective queries returning only a few records. In this case the query optimizer may have to rely on row stores, like clustered or regular nonclustered indexes, to find those records quickly. There are no seeks on columnstore indexes.
Same as with previous versions of SQL Server, you still have the choice to use a hint to force any index in the cases where the query optimizer is not giving you a good execution plan. This can happen for example when the query optimizer is choosing a columnstore index when it shouldn’t or when you want to force a columnstore index when it is not being selected. You can also use the new IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint to ask the query optimizer to avoid using any columnstore index.
Let me show you an example which you can test on SQL Server Denali CTP3, currently available for download here. To follow this example you will also need the AdventureWorksDWDenali database, available at CodePlex and I will use the same example on BOL to skip the basics and go directly to analyze the batch processing mode (By the way the BOL example didn’t work directly with the AdventureWorksDWDenali database so I had to add a few more columns at the end of the CREATE TABLE statement.)
First, use the following BOL code to create a partition function, a partition scheme and a new partitioned table with a columnstore index
USE AdventureWorksDWDenali; GO CREATE PARTITION FUNCTION [ByOrderDateMonthPF](int) AS RANGE RIGHT FOR VALUES ( 20050701, 20050801, 20050901, 20051001, 20051101, 20051201, 20060101, 20060201, 20060301, 20060401, 20060501, 20060601, 20060701, 20060801, 20060901, 20061001, 20061101, 20061201, 20070101, 20070201, 20070301, 20070401, 20070501, 20070601, 20070701, 20070801, 20070901, 20071001, 20071101, 20071201, 20080101, 20080201, 20080301, 20080401, 20080501, 20080601, 20080701, 20080801, 20080901, 20081001, 20081101, 20081201 ) GO CREATE PARTITION SCHEME [ByOrderDateMonthRange] AS PARTITION [ByOrderDateMonthPF] ALL TO ([PRIMARY]) GO -- Create a partitioned version of the FactResellerSales table CREATE TABLE [dbo].[FactResellerSalesPtnd]( [ProductKey] [int] NOT NULL, [OrderDateKey] [int] NOT NULL, [DueDateKey] [int] NOT NULL, [ShipDateKey] [int] NOT NULL, [ResellerKey] [int] NOT NULL, [EmployeeKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [SalesTerritoryKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [tinyint] NOT NULL, [RevisionNumber] [tinyint] NULL, [OrderQuantity] [smallint] NULL, [UnitPrice] [money] NULL, [ExtendedAmount] [money] NULL, [UnitPriceDiscountPct] [float] NULL, [DiscountAmount] [float] NULL, [ProductStandardCost] [money] NULL, [TotalProductCost] [money] NULL, [SalesAmount] [money] NULL, [TaxAmt] [money] NULL, [Freight] [money] NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [CustomerPONumber] [nvarchar](25) NULL, [OrderDate] datetime NULL, [DueDate] datetime NULL, [ShipDate] datetime NULL ) ON ByOrderDateMonthRange(OrderDateKey); GO -- Copy the data from the FactResellerSales into the new table INSERT INTO dbo.FactResellerSalesPtnd WITH(TABLOCK) SELECT * FROM dbo.FactResellerSales; GO -- Create the columnstore index CREATE NONCLUSTERED COLUMNSTORE INDEX [csindx_FactResellerSalesPtnd] ON [FactResellerSalesPtnd] ( [ProductKey], [OrderDateKey], [DueDateKey], [ShipDateKey], [ResellerKey], [EmployeeKey], [PromotionKey], [CurrencyKey], [SalesTerritoryKey], [SalesOrderNumber], [SalesOrderLineNumber], [RevisionNumber], [OrderQuantity], [UnitPrice], [ExtendedAmount], [UnitPriceDiscountPct], [DiscountAmount], [ProductStandardCost], [TotalProductCost], [SalesAmount], [TaxAmt], [Freight], [CarrierTrackingNumber], [CustomerPONumber] );
Now run the following query
SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory FROM FactResellerSalesPtnd GROUP BY SalesTerritoryKey;
This will create the following plan where you can see the new Columnstore Index Scan operator
The properties of the Columnstore Index Scan operator are shown next
You may notice that the actual and estimated execution mode is Row (lines 3 and 4 on the list of properties). Row execution mode was selected because the table is not large enough to require the batch execution mode. We can use the undocumented ROWCOUNT and PAGECOUNT options of the UPDATE STATISTICS statement to simulate a larger table as shown next (for more information about how this works see my post about the DTA here)
UPDATE STATISTICS FactResellerSalesPtnd WITH ROWCOUNT = 10000000, PAGECOUNT = 1000000
Removing the existing plan (using for example DBCC FREEPROCCACHE) and running the same query again will now show the following plan (only part is shown), which this time is using parallelism.
In addition, by looking at the properties of the Columnstore Index Scan you can notice that this time it is using the batch execution mode
You can also use the new IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint to disallow the use of a columnstore index. Run the following code
SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory FROM FactResellerSalesPtnd GROUP BY SalesTerritoryKey OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX);
This will show you the following plan which as you can see it is now directly using the FactResellerSalesPtnd table, without using the columnstore index.
Finally, since the number of records and pages of the FactResellerSalesPtnd table was altered for this test, perhaps you want drop it and create a new copy if you need to do some additional testing
DROP TABLE FactResellerSalesPtnd