Benjamin Nevarez Rotating Header Image

SQL Server 2014 Incremental Statistics

A major problem with updating statistics in large tables in SQL Server is that the entire table always has to be scanned, for example when using the WITH FULLSCAN option, even if only recent data has changed. This is also true when using partitioning: even if only the newest partition had changed since the last time statistics were updated, updating statistics again required to scan the entire table including all the partitions that didn’t change. Incremental statistics, a new SQL Server 2014 feature, can help with this problem.

Using incremental statistics you can update only the partition or partitions that you need and the information on these partitions will be merged with the existing information to create the final statistics object. Another advantage of incremental statistics is that the percentage of data changes required to trigger an automatic update of statistics now works at the partition level which basically means that now only 20% of rows changed (changes on the leading statistics column) per partition are required. Unfortunately the histogram is still limited to 200 steps for the entire statistics object in this version of SQL Server.

Let us look at an example of how can you update statistics at a partition level to explore its behavior as of SQL Server 2014. First we need to create a partitioned table using the AdventureWorks2012 database:

CREATE PARTITION FUNCTION TransactionRangePF1 (DATETIME)
AS RANGE RIGHT FOR VALUES 
(
   '20071001', '20071101', '20071201', '20080101', 
   '20080201', '20080301', '20080401', '20080501', 
   '20080601', '20080701', '20080801'
);
GO

CREATE PARTITION SCHEME TransactionsPS1 AS PARTITION TransactionRangePF1 TO 
(
  [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], 
  [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], 
  [PRIMARY], [PRIMARY], [PRIMARY]
);
GO

CREATE TABLE dbo.TransactionHistory 
(
  TransactionID        INT      NOT NULL, -- not bothering with IDENTITY here
  ProductID            INT      NOT NULL,
  ReferenceOrderID     INT      NOT NULL,
  ReferenceOrderLineID INT      NOT NULL DEFAULT (0),
  TransactionDate      DATETIME NOT NULL DEFAULT (GETDATE()),
  TransactionType      NCHAR(1) NOT NULL,
  Quantity             INT      NOT NULL,
  ActualCost           MONEY    NOT NULL,
  ModifiedDate         DATETIME NOT NULL DEFAULT (GETDATE()),
  CONSTRAINT CK_TransactionType 
    CHECK (UPPER(TransactionType) IN (N'W', N'S', N'P'))
) 
ON TransactionsPS1 (TransactionDate);
GO

Note: For details about partitioning and the CREATE PARTITION FUNCTION / SCHEME statements please refer to Partitioned Tables and Indexes in Books Online.

We currently have data to populate 12 partitions. Let us start by first populating only 11.

INSERT INTO dbo.TransactionHistory
SELECT * FROM Production.TransactionHistory
WHERE TransactionDate < '2008-08-01';

If required, you can use the following statement to inspect the contents of the partitions:

SELECT * FROM sys.partitions
  WHERE object_id = OBJECT_ID('dbo.TransactionHistory');

Let us create an incremental statistics object using the CREATE STATISTICS statement with the new INCREMENTAL clause set to ON (OFF is the default):

CREATE STATISTICS incrstats ON dbo.TransactionHistory(TransactionDate) 
  WITH FULLSCAN, INCREMENTAL = ON;

You can also create incremental statistics while creating an index using the new STATISTICS_INCREMENTAL clause of the CREATE INDEX statement.

You can inspect the created statistics object using DBCC:

DBCC SHOW_STATISTICS('dbo.TransactionHistory', incrstats);

Among other things, you will notice that the histogram has 200 steps (only the last 3 shown here):

  RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS
198 2008-07-25 00:00:00.000 187 100 2
199 2008-07-27 00:00:00.000 103 101 1
200 2008-07-31 00:00:00.000 281 131 3

Initial DBCC results

So we already have the maximum of steps in a statistics object. What would happen if you add data to a new partition? Let us add data to partition 12:

INSERT INTO dbo.TransactionHistory 
SELECT * FROM Production.TransactionHistory 
WHERE TransactionDate >= '2008-08-01';

Now, we update the statistics object using the following statement:

UPDATE STATISTICS dbo.TransactionHistory(incrstats) 
  WITH RESAMPLE ON PARTITIONS(12);

Note the new syntax specifying the partition, where you can specify multiple partitions, separated by comma. The UPDATE STATISTICS statement reads the specified partitions and then merges their results with the existing statistic object to build the global statistics. Note the RESAMPLE clause; this is required as partition statistics need to have the same sample rates to be merged to build the global statistics. Although only the specified partition was scanned, you can see that SQL Server has rearranged the histogram. The last three steps now show data for the added partition. You can also compare the original with the new histogram for other minor differences:

  RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS
197 2008-07-31 00:00:00.000 150 131 2
198 2008-08-12 00:00:00.000 300 36 9
199 2008-08-22 00:00:00.000 229 43 7
200 2008-09-03 00:00:00.000 363 37 11

DBCC results after the incremental update

If for any reason you want to disable the incremental statistics you can use the following statement to go back to the original behavior (or optionally just drop the statistics object and create a new one).

UPDATE STATISTICS dbo.TransactionHistory(incrstats) 
  WITH FULLSCAN, INCREMENTAL = OFF;

After disabling the incremental statistics trying to update a partition as shown previously will return the following error message:

Msg 9111, Level 16, State 1

UPDATE STATISTICS ON PARTITIONS syntax is not supported for non-incremental statistics.
 

Finally, you can also enable incremental statistics for your automatic statistics at the database level, if needed. This requires the INCREMENTAL = ON clause in the ALTER DATABASE statement and obviously also requires AUTO_CREATE_STATISTICS set to ON.

Note: Article originally posted on http://sqlperformance.com

The SQL Server Query Store

Have you ever found a plan regression after a SQL Server upgrade and wanted to know what the previous execution plan was? Have you ever had a query performance problem due to the fact that a query unexpectedly got a new execution plan? At the last PASS Summit, Conor Cunningham uncovered a new SQL Server feature, which can be helpful in solving performance problems related to these and other changes in execution plans.

This feature, called the Query Store, can help you with performance problems related to plan changes and will be available soon on SQL Azure and later on the next version of SQL Server. Although it is expected to be available on the Enterprise Edition of SQL Server, it is not yet known if it will be available on Standard or any other editions. To understand the benefits of the Query Store, let me talk briefly about the query troubleshooting process.

Why is a Query Slow?

Once you have detected that a performance problem is because a query is slow, the next step is to find out why. Obviously not every problem is related to plan changes. There could be multiple reasons why a query that has been performing well is suddenly slow. Sometimes this could be related to blocking or a problem with other system resources. Something else may have changed but the challenge may be to find out what. Many times we don’t have a baseline about system resource usage, query execution statistics or performance history. And usually we have no idea what the old plan was. It may be the case that some change, for example, data, schema or query parameters, made the query processor produce a new plan.

Plan Changes

At the session, Conor used the Picasso Database Query Optimizer Visualizer tool, although didn’t mention it by name, to show why the plans in the same query changed, and explained the fact that different plans could be selected for the same query based on the selectivity of their predicates. He even mentioned that the query optimizer team uses this tool, which was developed by the Indian Institute of Science. An example of the visualization:

Picasso Database Query Optimizer Visualizer

Picasso Database Query Optimizer Visualizer

Each color in the diagram is a different plan, and each plan is selected based on the selectivity of the predicates. An important fact is when a boundary is crossed in the graph and a different plan is selected, most of the times the cost and performance of both plans should be similar, as the selectivity or estimated number of rows only changed slightly. This could happen for example when a new row is added to a table which qualifies for the used predicate. However, in some cases, mostly due to limitations in the query optimizer cost model in which it is not able to model something correctly, the new plan can have a large performance difference compared to the previous one, creating a problem for your application. By the way, the plans shown on the diagram are the final plan selected by the query optimizer, don’t confuse this with the many alternatives the optimizer has to consider to select only one.

An important fact, in my opinion, which Conor didn’t cover directly, was the change of plans due to regressions after changes on cumulative updates (CUs), service packs, or version upgrades. A major concern that comes to mind with changes inside the query optimizer is plan regressions. The fear of plan regressions has been considered the biggest obstacle to query optimizer improvements. Regressions are problems introduced after a fix has been applied to the query optimizer, and sometimes referred as the classic “two or more wrongs make a right.” This can happen when, for example, two bad estimations, one overestimating a value and the second one underestimating it, cancel each other out, luckily giving a good estimate. Correcting only one of these values may now lead to a bad estimation which may negatively impact the choice of plan selection, causing a regression.

What Does the Query Store Do?

Conor mentioned the Query Store performs and can help with the following:

  1. Store the history of query plans in the system;
  2. Capture the performance of each query plan over time;
  3. Identify queries that have “gotten slower recently”;
  4. Allow you to force plans quickly; and,
  5. Make sure this works across server restarts, upgrades, and query recompiles.

So this feature not only stores the plans and related query performance information, but can also help you to easily force an old query plan, which in many cases can solve a performance problem.

How to Use the Query Store

You need to enable the Query Store by using the ALTER DATABASE CURRENT SET QUERY_STORE = ON; statement. I tried it in my current SQL Azure subscription, but the statement returned an error as it seems that the feature is not available yet. I contacted Conor and he told me that the feature will be available soon.

Once the Query Store is enabled, it will start collecting the plans and query performance data and you can analyze that data by looking at the Query Store tables. I can currently see those tables on SQL Azure but, since I was not able to enable the Query Store, the catalogs returned no data.

You can analyze the information collected either proactively to understand the query performance changes in your application, or retroactively in case you have a performance problem. Once you identify the problem you can use traditional query tuning techniques to try to fix the problem, or you can use the sp_query_store_force_plan stored procedure to force a previous plan. The plan has to be captured in the Query Store to be forced, which obviously means it is a valid plan (at least when it was collected; more on that later) and it was generated by the query optimizer before. To force a plan you need the plan_id, available in the sys.query_store_plan catalog. Once you look at the different metrics stored, which are very similar to what is stored for example in sys.dm_exec_query_stats, you can make the decision to optimize for a specific metric, like CPU, I/O, etc. Then you can simply use a statement like this:

EXEC sys.sp_query_store_force_plan @query_id = 1, @plan_id = 1;

This is telling SQL Server to force plan 1 on query 1. Technically you could do the same thing using a plan guide, but it would be more complicated and you would have to manually collect and find the required plan in the first place.

How Does the Query Store Work?

Actually forcing a plan uses plan guides in the background. Conor mentioned that “when you compile a query, we implicitly add a USE PLAN hint with the fragment of the XML plan associated with that statement.” So you no longer need to use a plan guide anymore. Also keep in mind that, same as using a plan guide, it is not guaranteed to have exactly the forced plan but at least something similar to it. For a reminder of how plan guides work take a look at this article. In addition, you should be aware that there are some cases where forcing a plan does not work, a typical example being when the schema has changed, i.e. if a stored plan uses an index but the index no longer exists. In this case SQL Server can not force the plan, will perform a normal optimization and it will record the fact that the forcing the plan operation failed in the sys.query_store_plan catalog.

Architecture

Every time SQL Server compiles or executes a query, a message is sent to the Query Store. This is shown next.

clip_image0036

Query Store Workflow Overview

The compile and execution information is kept in memory first and then saved to disk, depending on the Query Store configuration (the data is aggregated according to the INTERVAL_LENGTH_MINUTES parameter, which defaults to one hour, and flushed to disk according to the DATA_FLUSH_INTERVAL_SECONDS parameter). The data can also be flushed to disk if there is memory pressure on the system. In any case you will be able to access all of the data, both in memory and disk, when you run the sys.query_store_runtime_stats catalog.

Catalogs

The collected data is persisted on disk and stored in the user database where the Query Store is enabled (and settings are stored in sys.database_query_store_options. The Query Store catalogs are:

sys.query_store_query_text Query text information
sys.query_store_query Query text plus the used plan affecting SET options
sys.query_store_plan Execution plans, including history
sys.query_store_runtime_stats Query runtime statistics
sys.query_store_runtime_stats_interval Start and end time for intervals
sys.query_context_settings Query context settings information

Query Store views

Runtime statistics capture a whole slew of metrics, including the average, last, min, max, and standard deviation. Here is the full set of columns for sys.query_store_runtime_stats:

runtime_stats_id plan_id runtime_stats_interval_id
execution_type execution_type_desc first_execution_time last_execution_time count_executions
avg_duration last_duration min_duration max_duration stdev_duration
avg_cpu_time last_cpu_time min_cpu_time max_cpu_time stdev_cpu_time
avg_logical_io_reads last_logical_io_reads min_logical_io_reads max_logical_io_reads stdev_logical_io_reads
avg_logical_io_writes last_logical_io_writes min_logical_io_writes max_logical_io_writes stdev_logical_io_writes
avg_physical_io_reads last_physical_io_reads min_physical_io_reads max_physical_io_reads stdev_physical_io_reads
avg_clr_time last_clr_time min_clr_time max_clr_time stdev_clr_time
avg_dop last_dop min_dop max_dop stdev_dop
avg_query_max_used_memory last_query_max_used_memory min_query_max_used_memory max_query_max_used_memory stdev_query_max_used_memory
avg_rowcount last_rowcount min_rowcount max_rowcount stdev_rowcount

Columns in sys.query_store_runtime_stats

This data is only captured when query execution ends. The Query Store also considers the query’s SET options, which can impact the choice of an execution plan, as they affect things like the results of evaluating constant expressions during the optimization process. I cover this topic in a previous post.

Conclusion

This will definitely be a great feature and something I’d like to try as soon as possible (by the way, Conor’s demo shows “SQL Server 15 CTP1” but those bits are not publicly available). The Query Store can be useful for upgrades which could be a CU, service pack, or SQL Server version, as you can analyze the information collected by the Query Store before and after to see if any query has regressed. (And if the feature is available in lower editions, you could even do this in a SKU upgrade scenario.) Knowing this can help you to take some specific action depending on the problem, and one of those solutions could be to force the previous plan as explained before.

SQL Server 2014 Query Tuning & Optimization Book Code

Today I was finally able to finish extracting the code of my new book Microsoft SQL Server 2014 Query Tuning & Optimization after a few readers requested it recently but I had been busy with some traveling in the last few weeks. The book contains a large number of SQL queries, all of which are based on the AdventureWorks2012 database, except Chapter 9 which uses the AdventureWorksDW2012 database. All code has been tested on SQL Server 2014. Note that these sample databases are not included by default in your SQL Server installation, but can be downloaded from the CodePlex website.

To download the code click here Query Tuning and Optimization Code.

PASS Summit and Other Speaking Engagements

It is almost that time of the year again when we are all headed to attend the largest SQL Server conference in the world, the PASS Summit, hosted again in Seattle. In my particular case I will also be attending some other SQL Server events where I will be given probably more than a dozen presentations. But like a rock band getting ready for a world tour :-) I’ll start warming up with a local gig at our SQL Server user group. I’ll be presenting Query Processing in In-Memory OLTP (Hekaton) this Thursday, September 18th at the Los Angeles SQL Server Professionals Group. Their meetings are hosted at the UCLA campus and you can visit their page for more details.

Two days later I’ll be attending our SQLSaturday in San Diego where I am schedule to present three sessions: the Hekaton session previously mentioned, Understanding Parameter Sniffing and Dive into the Query Optimizer – Undocumented Insight. As usual this event includes great speakers from all over the country and this time we will have Itzik Ben-Gan, Grant Fritchey, Randy Knight, TJay Belt, Mickey Stuewe, Brandon Leach, and Andrew Karcher among others. You can see the schedule and register at the SQLSaturday website.

So far I only have one event scheduled for October which is a local user group meeting at the San Dimas Software & Database Professionals group where again I will be presenting my Hekaton session. The session is scheduled for October 8th and you can find the meetup page here.

Then the first week of November I’ll be attending both the MVP Summit and the PASS Summit in Seattle, WA. But my flight will make a short stop first in Portland, OR to attend the Oregon SQLSaturday which is scheduled for November 1st. My only session for the event, Query Processing in In-Memory OLTP (Hekaton), is scheduled at 4:30pm. After the Oregon SQLSaturday I will be attending my first MVP Summit from November 2nd to November 6th. I am excited to be attending this event for the first time. The PASS Summit starts on November 4th and I’ll be presenting Query Processing in In-Memory OLTP on Friday, November 7th at 8:00am, rooms 611-612. This would be my 12th PASS Summit and my 7th year as presenter.

Before closing 2014, I’ve been invited to Istanbul, Turkey to participate on their SQLSaturday which is scheduled for the first weekend of December. I am planning to deliver a pre-con on Friday 5th covering “SQL Server Query Tuning & Optimization” topics and will be presenting my Hekaton session at the SQLSaturday the following day. I’ll provide or tweet more details as they become available.

I look forward to seeing lots of SQL Server professionals and meeting new friends at these SQL Server events.

Finally, after writing a new book this year it looks like I took a good break from writing. I just started working on some articles again and will be posting them soon both on this blog and on SQLPerformance.com.

Speaking at SQLBits and the PASS Summit

image

I am really excited that this weekend I will be flying to the UK to participate in my first European SQL Server conference, SQLBits XII, where I am scheduled to present 2 sessions. SQLBits XII, the largest SQL Server conference in Europe, will be held at The International Centre in Telford, UK from July 17th to July 19th. My first session, Dive into the Query Optimizer – Undocumented Insight, is scheduled for Friday 18th at 12:15 PM, room Ironbridge 3 and the second one, Understanding Parameter Sniffing for Saturday 19th at 8:15 AM, room Wenlock 1. These are sessions I have presented previously at the PASS Summit.

In addition, I got selected to present a session at the PASS Summit 2014, which will be held again in Seattle, WA this November. The PASS Summit is the largest SQL Server and BI conference in the world and this would be my 12th year attending, and my 7th year as a presenter. My session, Query Processing in In-Memory OLTP, will cover query processing and optimization with the new Hekaton database engine, a topic which I also cover in a chapter of my new book.

Just a few days after the sessions for the PASS Summit 2014 were announced, I also got the nice surprise that I was awarded as a SQL Server MVP. It is a great honor for me to be part of this awesome team.

Finally, my new book SQL Server 2014 Query Tuning & Optimization has been completed and will be soon published by McGraw-Hill Education.

I look forward to meeting lots of SQL Server professionals on these conferences. See you all in Telford and Seattle.

New Book: SQL Server 2014 Query Tuning & Optimization

clip_image002

 

It looks like my blog has been very quiet for a while once again. Yes, I’ve been busy working on some other SQL Server projects since last year. One of those projects was working on Kalen Delaney’s new book SQL Server 2012 Internals where I co-authored two chapters, “Special Databases” and “Query Execution”, which were originally written for previous versions of the book by Kalen and Craig Freedman, respectively. The book was finally released last November.

As soon as I finished my part on Kalen’s book, I immediately started working on my new book, SQL Server 2014 Query Tuning & Optimization, which has been available for pre-order at Amazon for a while now. This book covers everything you need to know about query tuning and optimization for SQL Server to help you design high-performing database applications. As a SQL Server 2014 book it obviously covers new features like Hekaton, the new cardinality estimator, and incremental statistics, among others, but most of the content can be used in previous versions of SQL Server as well. I am currently working on the last items of the book with the great team at McGraw-Hill Education and the help of the amazing Dave Ballantyne (@davebally) as technical reviewer and the book should be available soon.

Also, at the last PASS Summit in Charlotte I was invited by SQL Sentry to blog at their SQLPerformance.com site, which has great content by some of the most amazing bloggers in the SQL Server community. Although I haven’t had much time to blog there as I would wanted, I have started with a few SQL Server 2014 articles that you can read here.

Finally, I have submitted sessions for SQLBits (first time ever submitted) and to the PASS Summit (which I have attended for 11 years so far), and just at the moment I am writing this I got the great news that I got two sessions selected for SQLBits so I am really looking forward to finally attend this conference in the UK this July. I am also planning to attend some other SQL Server events in the US, including our always amazing local SQLSaturdays in Orange County this April 26th, Sacramento on July 12th and San Diego, which is usually scheduled for September. Hope to see you at any of these SQL Server events.

Speaking at the PASS Summit 2013

clip_image001

In just a few days I will be headed to Charlotte, NC to attend my eleventh PASS Summit. Yes, I’ve been attending this, the largest SQL Server and BI conference in the world, every year since 2003. I am also honored to be speaking at the conference again making this my sixth year as a presenter. My session, Defeating the Limitations of the Query Optimizer, which I just presented for the first time at the San Diego SQLSaturday #249, is scheduled at the end of the conference, Friday 4:15 PM at the Ballroom B.

Of course I am also planning to attend many sessions. I haven’t finished my schedule yet but of what I have seen so far I definitely want to attend Mark Russinovich’s Windows Azure Deep Dive, Bob Ward’s Inside SQL Server 2012 Memory: The Sequel, Conor Cunningham’s OLTP Sharding Techniques for Massive Scale, Kevin Liu’s SQL Server 2014 In-Memory OLTP: DBA Deep Dive and Kevin Farlee’s SQL Server 2014 In-Memory OLTP: DB Developer Deep Dive.

In addition to the PASS Summit I will also be attending SQL in the City on Monday and SQLSaturday Charlotte BI Edition on Saturday. I will also be speaking at this last event where I will present my session Dive into the Query Optimizer – Undocumented Insight. You can still register for these two free events.

I look forward to meeting lots of SQL Server professionals, including those whom I only know via twitter. See you all in Charlotte!

A Tour of the Hekaton AMR Tool

SQL Server 2014 CTP1 has a tool to help you decide which tables and stored procedures you can move to Hekaton or In-Memory OLTP. In this post I will give you a quick tour of this tool, the AMR (Analysis, Migration and Reporting). The AMR tool is integrated with the Data Collector and to enable it you have to enable the new Transaction Performance Collection Sets on the Configure Data Collection Wizard as shown next.

clip_image002

This will create two new collection sets, Stored Procedure Usage Analysis and Table Usage Analysis, in addition to the three system data collections sets previously available with the Data Collector.

Once you configure the Transaction Performance Collection Sets on the Data Collector you are ready to test the AMR tool. First, you need to create some database activity. In my case I am testing with the following stored procedures on a copy of AdventureWorks2012.

CREATE PROCEDURE test1
AS
SELECT * FROM Sales.SalesOrderHeader soh
    JOIN Sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
WHERE ProductID = 870
GO
CREATE PROCEDURE test2
AS
SELECT ProductID, SalesOrderID, COUNT(*)
FROM Sales.SalesOrderDetail
GROUP BY ProductID, SalesOrderID

Execute the procedures multiple times, for example, 20 or 30.

EXEC test1
GO
EXEC test2
GO

After you create some database activity you may have to wait for the next Data Collector upload job to execute. For example, the Stored Procedure Usage Analysis upload job runs every 30 minutes and the Table Usage Analysis runs every 15 minutes. You could also run these jobs (collection_set_5_upload and collection_set_6_upload respectively) manually.

To access the AMR reports right-click your MDW database, select Reports, Management Data Warehouse and Transaction Performance Analysis Overview. This is the main AMR tool report.

clip_image004

Clicking on Usage Analysis on the Tables Analysis section will show the Recommended Tables Based on Usage report showing the top candidate tables for memory optimization based on the access patterns of the workload. The report graph shows the performance benefits on memory optimization along with the migration effort required to move tables to In-Memory OLTP based on how many unsupported features the table uses.

clip_image006

Selecting Tables Analysis, Contention Analysis on the main report will show the Recommended Tables Based on Contention report as shown next.

clip_image008

In both cases, the reports recommend to prioritize the tables in the top right corner of the graph. Clicking on a table name on any of previous reports will show the table performance statistics details, including both table usage and contention statistics. An example is shown next.

clip_image010

On the other hand, selecting on Stored Procedure Analysis, Usage Analysis on the main report will take us to the Recommended Stored Procedures Based on Usage report, showed next. This report will show the top stored procedures based on total worker time in milliseconds.

clip_image012

Similarly, clicking on a specific stored procedure will show the stored procedure execution statistics detail and the tables it references. An example for the test2 stored procedure is next.

clip_image014

You can also click on the listed tables to get the table’s performance statistics report shown previously.

In summary the AMR tool will be very helpful in providing recommendations as to which tables and stored procedures you might want to consider migrating into Hekaton. You could even upgrade your database to SQL Server 2014 and run this tool for analysis of your real performance data and iteratively move tables and stored procedures to In-Memory OLTP as required.

Hekaton: Transforming Query Plans into C Code

In-Memory OLTP, code-named Hekaton, is without a doubt the most exciting new feature of SQL Server 2014. We all first learned about this technology when it was announced back at the PASS Summit 2012 conference where it was also mentioned that it will be included in the next major release of SQL Server. Recently at TechEd North America Microsoft finally announced that this new version will be named SQL Server 2014 and that its first CTP will be available to download by the end of June. So after playing with this CTP for a few days I decided to write a few posts mostly focused on the query processing area of the technology.

But perhaps one of the most important concepts we are learning with Hekaton is the fact that even with the new hardware available today, obtaining ten times or more better performance will also require to dramatically change the way data management systems are designed. For example, database management systems were originally designed under the assumption that memory is limited and data should have to reside on disk pages. This assumption is no longer true and currently it is possible to fit entire databases in memory, or at least the most critical tables in an OLTP environment. So taking benefit of this available memory is not just a matter of reading more of the existing disk pages to memory but re-designing data management systems using a different approach to take the most possible benefit of this new hardware.

The best way to start learning about Hekaton is reading Kalen Delaney’s white paper SQL Server In-Memory OLTP Internals Overview for CTP1 and of course, downloading and playing with the CTP1 itself. So assuming you are familiar with the basic concepts in this post I will focus on two of the main objects that Hekaton provides:

1) Memory optimized tables. Hekaton tables are stored in memory, all the time. Obviously they are also kept in disk, but only for durability purposes. These tables provide two new kinds of indexes, hash and range indexes, although only the former kind is available on the first CTP. And you may be surprised that memory optimized tables are compiled into native DLLs when they are created.

2) Natively compiled stored procedures. This new kind of stored procedures are first optimized by the query optimizer, like any regular stored procedure, but are also later translated into C code and compiled into machine code. This process again will create DLLs.

Limitations

Perhaps the main limitation of Hekaton is that memory optimized tables could not be altered, they would need to be dropped and created again with the required changes. This is the case, for example, if you want to add a new column or a new index. Obviously, dropping the table would mean first copying its data to another location, creating a new empty table with the required changes and copying the data back, which would require some downtime for live applications.

You cannot alter natively compiled stored procedures either. Or even recompile them (except in a few limited cases). In addition, as the data on Hekaton tables change, statistics are not automatically updated, you will need to manually update them using the UPDATE STATISTICS statement. Also, after getting updated statistics your existing natively compiled stored procedures cannot take benefit of it. You will have to manually drop and re-create the store procedure.

Example

Let me show you an example of how the technology works, using a copy of AdventureWorks2012 database. First let us create a new database.

CREATE DATABASE Test
ON PRIMARY (NAME = [Test_data],
FILENAME = 'C:\DATA\Test_data.mdf', SIZE=500MB),
FILEGROUP [Test_fg] CONTAINS MEMORY_OPTIMIZED_DATA 
(NAME = [Test_fg], FILENAME = 'C:\DATA\Test_fg')
LOG ON (NAME = [Test_log], Filename='C:\DATA\Test_log.ldf', SIZE=500MB)
COLLATE Latin1_General_100_BIN2;

Now we can create a memory optimized table

CREATE TABLE [TransactionHistoryArchive] (
    [TransactionID] [int] NOT NULL PRIMARY KEY 
        NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000),
    [ProductID] [int] NOT NULL,
    [ReferenceOrderID] [int] NOT NULL,
    [ReferenceOrderLineID] [int] NOT NULL,
    [TransactionDate] [datetime] NOT NULL,
    [TransactionType] [nchar](1) NOT NULL,
    [Quantity] [int] NOT NULL,
    [ActualCost] [money] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)

You cannot copy data directly from a second database to a memory optimized table. So we need to copy the data to a regular table first.

SELECT * INTO temp
FROM AdventureWorks2012.Production.TransactionHistoryArchive

Now we can copy the data to the memory optimized table

INSERT INTO TransactionHistoryArchive
SELECT * FROM temp

Let us look at our first execution plans using T-SQL against our memory optimized table (which by the way is defined as T-SQL using the query interop capabilities).

SELECT * FROM TransactionHistoryArchive
WHERE TransactionID = 8209

Since we defined a hash index on the TransactionId column we get the following plan

image

Now a query using a different column

SELECT * FROM TransactionHistoryArchive
WHERE ProductID = 780

image

The execution plans shows the familiar Index Scan and Index Seek operations but obviously the algorithms for this execution are not the same, this time the table and indexes are in memory and use totally different structures. You can identify differences on these operators by looking at NonClusteredHash after the name of the operator and Storage shown as MemoryOptimized in the properties of the operator as shown next.

image

Let us now create a stored procedure and yes, all the new clauses (like NATIVE_COMPILATION or SCHEMABINDING) are required, leaving any of these out will produce an error

CREATE PROCEDURE test
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english')
SELECT TransactionID, ProductID, ReferenceOrderID FROM dbo.TransactionHistoryArchive
WHERE ProductID = 780
END

Then you can run the procedure by using

EXEC test

Something that you may notice is that executing the native stored procedure does not show any execution plan. At this moment I don’t know if the plan will be visible on a later CTP or the RTM but assuming it won’t, this may be a problem as we would not have an execution plan to troubleshoot. You may say that we can still see the plan using the T-SQL code but keep in mind that data and statistics may have changed since the procedure was created and so it does not mean that the plan you currently get was the same that was compiled to native code. Anyway, we would just have to wait for the next CTPs and see how is this going to work.

UPDATE: I got an update from Jos de Bruijn from Microsoft saying “You are correct that the plan for the native procedure may not be the same as in interpreted T-SQL. Not only can there be differences in statistics, but there are also certain operators we do not support in native procedures (e.g. hash join, hash aggregate). Starting with CTP2, we will be supporting SHOWPLAN_XML (also called the “estimated execution plan” in Management Studio), so you will be able to see the plan that was used to compile the procedure.”

Finally, it is interesting to note that the generated DLLs are not kept on the database but in the file system and you can find them, along with some other intermediate files by looking at the location returned by the following query

SELECT name, description FROM sys.dm_os_loaded_modules
where description = 'XTP Native DLL'

Generated C code is available at that location. A small fragment of the C code created on the previous examples is shown next

#include "hkenggen.h"
#include "hkrtgen.h"
#include "hkgenlib.h"

#define ENABLE_INTSAFE_SIGNED_FUNCTIONS
#include "intsafe.h"

int _fltused = 0;

int memcmp(const void*, const void*, size_t);
void *memcpy(void*, const void*, size_t);
void *memset(void*, int, size_t);

#define offsetof(s,f)   ((size_t)&(((s*)0)->f))

struct hkt_277576027
{
    __int64 hkc_5;
    __int64 hkc_8;
    __int64 hkc_9;
    long hkc_1;
    long hkc_2;
    long hkc_3;
    long hkc_4;
    long hkc_7;
    unsigned short hkvdo[2];
};

Query Optimization Research Papers

Two of the questions I’ve been asked sometimes are which sources I researched to write my Query Optimizer book and which research papers can I recommend to learn more about query optimization. Since I got asked about it again at the Tampa SQLSaturday last week, I wrote this short article on my flight back to Los Angeles to discuss this topic.

But first a warning: reading these academic papers may require a strong computer science background and they are usually more complicated than the SQL Server documentation, books or blogs we read every day. In addition, there are dozens or even hundreds of these articles, covering more than 40 years of query optimization research. Although I cannot list all the ones I have read I can definitely give you a way to get started so you can continue with the topics that may interest you.

Research papers reference other papers in the text and you can find the referenced paper details at the end of each article, so if you are interested in one particular area you can go and read that paper directly. By following other listed sources, which will also have additional references, you could find an almost unlimited source of information.

Although research papers usually focus on a specific area or problem you can get started by reading a few articles which are a more general overview before trying to read more specific topics. Some of these papers to get started are:

An Overview of Query Optimization in Relational Systems by Surajit Chaudhuri

Query Optimization by Yannis E. Ioannidis

An Overview of Data Warehousing and OLAP Technology by Surajit Chaudhuri, Umeshwar Dayal

By following references on those and other similar papers you can find dozens of articles which would be impossible to list here, but just to give you three examples:

Optimizing Join Orders by Michael Steinbrunn, Guido Moerkotte, Alfons Kemper

An Overview of Cost-based Optimization of Queries with Aggregates by Surajit Chaudhuri

Counting, Enumerating, and Sampling of Execution Plans in a Cost-Based Query Optimizer by Florian Waas, Cesar Galindo-Legaria

Some of these papers may be SQL Server related:

Query Processing for SQL Updates by Cesar Galindo-Legaria, Stefano Stefani, Florian Waas

Self-Tuning Database Systems: A Decade of Progress by Surajit Chaudhuri

An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server by Surajit Chaudhuri, Vivek Narasayya

SQL Server implemented its own cost-based query optimizer based on the Cascades Framework, when its database engine was re-architected for the release of SQL Server 7.0. Cascades is also based on other previous research work: Volcano and Exodus. You can read about these research projects here:

The Cascades Framework for Query Optimization by Goetz Graefe

The Volcano optimizer generator: Extensibility and efficient search by Goetz Graefe

The EXODUS Optimizer Generator by Goetz Graefe, David J. DeWitt

Finally, in this post I covered query optimization papers but obviously you can also find information on other areas of database research as well.