Benjamin Nevarez Rotating Header Image

Hekaton: Transforming Query Plans into C Code

In-Memory OLTP, code-named Hekaton, is without a doubt the most exciting new feature of SQL Server 2014. We all first learned about this technology when it was announced back at the PASS Summit 2012 conference where it was also mentioned that it will be included in the next major release of SQL Server. Recently at TechEd North America Microsoft finally announced that this new version will be named SQL Server 2014 and that its first CTP will be available to download by the end of June. So after playing with this CTP for a few days I decided to write a few posts mostly focused on the query processing area of the technology.

But perhaps one of the most important concepts we are learning with Hekaton is the fact that even with the new hardware available today, obtaining ten times or more better performance will also require to dramatically change the way data management systems are designed. For example, database management systems were originally designed under the assumption that memory is limited and data should have to reside on disk pages. This assumption is no longer true and currently it is possible to fit entire databases in memory, or at least the most critical tables in an OLTP environment. So taking benefit of this available memory is not just a matter of reading more of the existing disk pages to memory but re-designing data management systems using a different approach to take the most possible benefit of this new hardware.

The best way to start learning about Hekaton is reading Kalen Delaney’s white paper SQL Server In-Memory OLTP Internals Overview for CTP1 and of course, downloading and playing with the CTP1 itself. So assuming you are familiar with the basic concepts in this post I will focus on two of the main objects that Hekaton provides:

1) Memory optimized tables. Hekaton tables are stored in memory, all the time. Obviously they are also kept in disk, but only for durability purposes. These tables provide two new kinds of indexes, hash and range indexes, although only the former kind is available on the first CTP. And you may be surprised that memory optimized tables are compiled into native DLLs when they are created.

2) Natively compiled stored procedures. This new kind of stored procedures are first optimized by the query optimizer, like any regular stored procedure, but are also later translated into C code and compiled into machine code. This process again will create DLLs.

Limitations

Perhaps the main limitation of Hekaton is that memory optimized tables could not be altered, they would need to be dropped and created again with the required changes. This is the case, for example, if you want to add a new column or a new index. Obviously, dropping the table would mean first copying its data to another location, creating a new empty table with the required changes and copying the data back, which would require some downtime for live applications.

You cannot alter natively compiled stored procedures either. Or even recompile them (except in a few limited cases). In addition, as the data on Hekaton tables change, statistics are not automatically updated, you will need to manually update them using the UPDATE STATISTICS statement. Also, after getting updated statistics your existing natively compiled stored procedures cannot take benefit of it. You will have to manually drop and re-create the store procedure.

Example

Let me show you an example of how the technology works, using a copy of AdventureWorks2012 database. First let us create a new database.

CREATE DATABASE Test
ON PRIMARY (NAME = [Test_data],
FILENAME = 'C:\DATA\Test_data.mdf', SIZE=500MB),
FILEGROUP [Test_fg] CONTAINS MEMORY_OPTIMIZED_DATA 
(NAME = [Test_fg], FILENAME = 'C:\DATA\Test_fg')
LOG ON (NAME = [Test_log], Filename='C:\DATA\Test_log.ldf', SIZE=500MB)
COLLATE Latin1_General_100_BIN2;

Now we can create a memory optimized table

CREATE TABLE [TransactionHistoryArchive] (
    [TransactionID] [int] NOT NULL PRIMARY KEY 
        NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000),
    [ProductID] [int] NOT NULL,
    [ReferenceOrderID] [int] NOT NULL,
    [ReferenceOrderLineID] [int] NOT NULL,
    [TransactionDate] [datetime] NOT NULL,
    [TransactionType] [nchar](1) NOT NULL,
    [Quantity] [int] NOT NULL,
    [ActualCost] [money] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)

You cannot copy data directly from a second database to a memory optimized table. So we need to copy the data to a regular table first.

SELECT * INTO temp
FROM AdventureWorks2012.Production.TransactionHistoryArchive

Now we can copy the data to the memory optimized table

INSERT INTO TransactionHistoryArchive
SELECT * FROM temp

Let us look at our first execution plans using T-SQL against our memory optimized table (which by the way is defined as T-SQL using the query interop capabilities).

SELECT * FROM TransactionHistoryArchive
WHERE TransactionID = 8209

Since we defined a hash index on the TransactionId column we get the following plan

image

Now a query using a different column

SELECT * FROM TransactionHistoryArchive
WHERE ProductID = 780

image

The execution plans shows the familiar Index Scan and Index Seek operations but obviously the algorithms for this execution are not the same, this time the table and indexes are in memory and use totally different structures. You can identify differences on these operators by looking at NonClusteredHash after the name of the operator and Storage shown as MemoryOptimized in the properties of the operator as shown next.

image

Let us now create a stored procedure and yes, all the new clauses (like NATIVE_COMPILATION or SCHEMABINDING) are required, leaving any of these out will produce an error

CREATE PROCEDURE test
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english')
SELECT TransactionID, ProductID, ReferenceOrderID FROM dbo.TransactionHistoryArchive
WHERE ProductID = 780
END

Then you can run the procedure by using

EXEC test

Something that you may notice is that executing the native stored procedure does not show any execution plan. At this moment I don’t know if the plan will be visible on a later CTP or the RTM but assuming it won’t, this may be a problem as we would not have an execution plan to troubleshoot. You may say that we can still see the plan using the T-SQL code but keep in mind that data and statistics may have changed since the procedure was created and so it does not mean that the plan you currently get was the same that was compiled to native code. Anyway, we would just have to wait for the next CTPs and see how is this going to work.

UPDATE: I got an update from Jos de Bruijn from Microsoft saying “You are correct that the plan for the native procedure may not be the same as in interpreted T-SQL. Not only can there be differences in statistics, but there are also certain operators we do not support in native procedures (e.g. hash join, hash aggregate). Starting with CTP2, we will be supporting SHOWPLAN_XML (also called the “estimated execution plan” in Management Studio), so you will be able to see the plan that was used to compile the procedure.”

Finally, it is interesting to note that the generated DLLs are not kept on the database but in the file system and you can find them, along with some other intermediate files by looking at the location returned by the following query

SELECT name, description FROM sys.dm_os_loaded_modules
where description = 'XTP Native DLL'

Generated C code is available at that location. A small fragment of the C code created on the previous examples is shown next

#include "hkenggen.h"
#include "hkrtgen.h"
#include "hkgenlib.h"

#define ENABLE_INTSAFE_SIGNED_FUNCTIONS
#include "intsafe.h"

int _fltused = 0;

int memcmp(const void*, const void*, size_t);
void *memcpy(void*, const void*, size_t);
void *memset(void*, int, size_t);

#define offsetof(s,f)   ((size_t)&(((s*)0)->f))

struct hkt_277576027
{
    __int64 hkc_5;
    __int64 hkc_8;
    __int64 hkc_9;
    long hkc_1;
    long hkc_2;
    long hkc_3;
    long hkc_4;
    long hkc_7;
    unsigned short hkvdo[2];
};

About the author

Benjamin Nevarez Benjamin Nevarez is a database professional based in Los Angeles, CA, and author of "Inside the SQL Server Query Optimizer". He has also contributed to other SQL Server books including "SQL Server 2012 Internals". Benjamin has 20 years of experience with relational databases and has been working with SQL Server since version 6.5. He holds a Master’s Degree in Computer Science and has been a speaker at many SQL Server conferences, including the PASS Summit and SQL Server Connections. Benjamin’s blog is at http://www.benjaminnevarez.com, can be reached by e-mail at admin at benjaminnevarez dot com, on twitter at @BenjaminNevarez and on .

8 Comments

  1. tobi says:

    Unfortunately, all posts about Hekaton seem to read like “You cannot do x, y and z” where x, y and z are the most obvious things one would expect to work. I hope that v2 receives lots of further investments. v1 seems to be for customers who are facing serious pain right now.

  2. “At this moment I don’t know if the plan will be visible on a later CTP or the RTM but assuming it won’t”.. IMO we have to forget about the concept of a plan for native compiled procedures. We will have to read the C code to see exactly what is happening and how to phrase our sql queries to be the most optimal.

  3. Dave,

    I think that looking at the C code would be way too complicated especially when looking at complex stored procedures. I don’t see anything on the plan cache either.

    Ben

  4. Dave,

    I got an update from Microsoft: execution plans for natively compiled stored procedures will be available in CTP2.

    Ben

  5. Hi Tobi,

    Yes, I agree, many limitations for the first release. I had to remove many table properties while testing with AdventureWorks :-)

    Ben

  6. [...] Hekaton: Transforming Query Plans into C Code - Benjamin Nevarez (Blog|Twitter) [...]

Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>