In-Memory OLTP, code-named Hekaton, is without a doubt the most exciting new feature of SQL Server 2014. We all first learned about this technology when it was announced back at the PASS Summit 2012 conference where it was also mentioned that it will be included in the next major release of SQL Server. Recently at TechEd North America Microsoft finally announced that this new version will be named SQL Server 2014 and that its first CTP will be available to download by the end of June. So after playing with this CTP for a few days I decided to write a few posts mostly focused on the query processing area of the technology.
But perhaps one of the most important concepts we are learning with Hekaton is the fact that even with the new hardware available today, obtaining ten times or more better performance will also require to dramatically change the way data management systems are designed. For example, database management systems were originally designed under the assumption that memory is limited and data should have to reside on disk pages. This assumption is no longer true and currently it is possible to fit entire databases in memory, or at least the most critical tables in an OLTP environment. So taking benefit of this available memory is not just a matter of reading more of the existing disk pages to memory but re-designing data management systems using a different approach to take the most possible benefit of this new hardware.
The best way to start learning about Hekaton is reading Kalen Delaney’s white paper SQL Server In-Memory OLTP Internals Overview for CTP1 and of course, downloading and playing with the CTP1 itself. So assuming you are familiar with the basic concepts in this post I will focus on two of the main objects that Hekaton provides:
1) Memory optimized tables. Hekaton tables are stored in memory, all the time. Obviously they are also kept in disk, but only for durability purposes. These tables provide two new kinds of indexes, hash and range indexes, although only the former kind is available on the first CTP. And you may be surprised that memory optimized tables are compiled into native DLLs when they are created.
2) Natively compiled stored procedures. This new kind of stored procedures are first optimized by the query optimizer, like any regular stored procedure, but are also later translated into C code and compiled into machine code. This process again will create DLLs.
Perhaps the main limitation of Hekaton is that memory optimized tables could not be altered, they would need to be dropped and created again with the required changes. This is the case, for example, if you want to add a new column or a new index. Obviously, dropping the table would mean first copying its data to another location, creating a new empty table with the required changes and copying the data back, which would require some downtime for live applications.
You cannot alter natively compiled stored procedures either. Or even recompile them (except in a few limited cases). In addition, as the data on Hekaton tables change, statistics are not automatically updated, you will need to manually update them using the UPDATE STATISTICS statement. Also, after getting updated statistics your existing natively compiled stored procedures cannot take benefit of it. You will have to manually drop and re-create the store procedure.
Let me show you an example of how the technology works, using a copy of AdventureWorks2012 database. First let us create a new database.
CREATE DATABASE Test
ON PRIMARY (NAME = [Test_data],
FILENAME = 'C:\DATA\Test_data.mdf', SIZE=500MB),
FILEGROUP [Test_fg] CONTAINS MEMORY_OPTIMIZED_DATA
(NAME = [Test_fg], FILENAME = 'C:\DATA\Test_fg')
LOG ON (NAME = [Test_log], Filename='C:\DATA\Test_log.ldf', SIZE=500MB)
Now we can create a memory optimized table
CREATE TABLE [TransactionHistoryArchive] (
[TransactionID] [int] NOT NULL PRIMARY KEY
NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000),
[ProductID] [int] NOT NULL,
[ReferenceOrderID] [int] NOT NULL,
[ReferenceOrderLineID] [int] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionType] [nchar](1) NOT NULL,
[Quantity] [int] NOT NULL,
[ActualCost] [money] NOT NULL,
[ModifiedDate] [datetime] NOT NULL
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
You cannot copy data directly from a second database to a memory optimized table. So we need to copy the data to a regular table first.
SELECT * INTO temp
Now we can copy the data to the memory optimized table
INSERT INTO TransactionHistoryArchive
SELECT * FROM temp
Let us look at our first execution plans using T-SQL against our memory optimized table (which by the way is defined as T-SQL using the query interop capabilities).
SELECT * FROM TransactionHistoryArchive
WHERE TransactionID = 8209
Since we defined a hash index on the TransactionId column we get the following plan
Now a query using a different column
SELECT * FROM TransactionHistoryArchive
WHERE ProductID = 780
The execution plans shows the familiar Index Scan and Index Seek operations but obviously the algorithms for this execution are not the same, this time the table and indexes are in memory and use totally different structures. You can identify differences on these operators by looking at NonClusteredHash after the name of the operator and Storage shown as MemoryOptimized in the properties of the operator as shown next.
Let us now create a stored procedure and yes, all the new clauses (like NATIVE_COMPILATION or SCHEMABINDING) are required, leaving any of these out will produce an error
CREATE PROCEDURE test
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english')
SELECT TransactionID, ProductID, ReferenceOrderID FROM dbo.TransactionHistoryArchive
WHERE ProductID = 780
Then you can run the procedure by using
Something that you may notice is that executing the native stored procedure does not show any execution plan. At this moment I don’t know if the plan will be visible on a later CTP or the RTM but assuming it won’t, this may be a problem as we would not have an execution plan to troubleshoot. You may say that we can still see the plan using the T-SQL code but keep in mind that data and statistics may have changed since the procedure was created and so it does not mean that the plan you currently get was the same that was compiled to native code. Anyway, we would just have to wait for the next CTPs and see how is this going to work.
UPDATE: I got an update from Jos de Bruijn from Microsoft saying “You are correct that the plan for the native procedure may not be the same as in interpreted T-SQL. Not only can there be differences in statistics, but there are also certain operators we do not support in native procedures (e.g. hash join, hash aggregate). Starting with CTP2, we will be supporting SHOWPLAN_XML (also called the “estimated execution plan” in Management Studio), so you will be able to see the plan that was used to compile the procedure.”
Finally, it is interesting to note that the generated DLLs are not kept on the database but in the file system and you can find them, along with some other intermediate files by looking at the location returned by the following query
SELECT name, description FROM sys.dm_os_loaded_modules
where description = 'XTP Native DLL'
Generated C code is available at that location. A small fragment of the C code created on the previous examples is shown next
int _fltused = 0;
int memcmp(const void*, const void*, size_t);
void *memcpy(void*, const void*, size_t);
void *memset(void*, int, size_t);
#define offsetof(s,f) ((size_t)&(((s*)0)->f))
unsigned short hkvdo;