Benjamin Nevarez Rotating Header Image

Hekaton

A Tour of the Hekaton AMR Tool

SQL Server 2014 CTP1 has a tool to help you decide which tables and stored procedures you can move to Hekaton or In-Memory OLTP. In this post I will give you a quick tour of this tool, the AMR (Analysis, Migration and Reporting). The AMR tool is integrated with the Data Collector and to enable it you have to enable the new Transaction Performance Collection Sets on the Configure Data Collection Wizard as shown next.

clip_image002

This will create two new collection sets, Stored Procedure Usage Analysis and Table Usage Analysis, in addition to the three system data collections sets previously available with the Data Collector.

Once you configure the Transaction Performance Collection Sets on the Data Collector you are ready to test the AMR tool. First, you need to create some database activity. In my case I am testing with the following stored procedures on a copy of AdventureWorks2012.

CREATE PROCEDURE test1
AS
SELECT * FROM Sales.SalesOrderHeader soh
    JOIN Sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
WHERE ProductID = 870
GO
CREATE PROCEDURE test2
AS
SELECT ProductID, SalesOrderID, COUNT(*)
FROM Sales.SalesOrderDetail
GROUP BY ProductID, SalesOrderID

Execute the procedures multiple times, for example, 20 or 30.

EXEC test1
GO
EXEC test2
GO

After you create some database activity you may have to wait for the next Data Collector upload job to execute. For example, the Stored Procedure Usage Analysis upload job runs every 30 minutes and the Table Usage Analysis runs every 15 minutes. You could also run these jobs (collection_set_5_upload and collection_set_6_upload respectively) manually.

To access the AMR reports right-click your MDW database, select Reports, Management Data Warehouse and Transaction Performance Analysis Overview. This is the main AMR tool report.

clip_image004

Clicking on Usage Analysis on the Tables Analysis section will show the Recommended Tables Based on Usage report showing the top candidate tables for memory optimization based on the access patterns of the workload. The report graph shows the performance benefits on memory optimization along with the migration effort required to move tables to In-Memory OLTP based on how many unsupported features the table uses.

clip_image006

Selecting Tables Analysis, Contention Analysis on the main report will show the Recommended Tables Based on Contention report as shown next.

clip_image008

In both cases, the reports recommend to prioritize the tables in the top right corner of the graph. Clicking on a table name on any of previous reports will show the table performance statistics details, including both table usage and contention statistics. An example is shown next.

clip_image010

On the other hand, selecting on Stored Procedure Analysis, Usage Analysis on the main report will take us to the Recommended Stored Procedures Based on Usage report, showed next. This report will show the top stored procedures based on total worker time in milliseconds.

clip_image012

Similarly, clicking on a specific stored procedure will show the stored procedure execution statistics detail and the tables it references. An example for the test2 stored procedure is next.

clip_image014

You can also click on the listed tables to get the table’s performance statistics report shown previously.

In summary the AMR tool will be very helpful in providing recommendations as to which tables and stored procedures you might want to consider migrating into Hekaton. You could even upgrade your database to SQL Server 2014 and run this tool for analysis of your real performance data and iteratively move tables and stored procedures to In-Memory OLTP as required.

Hekaton: Transforming Query Plans into C Code

In-Memory OLTP, code-named Hekaton, is without a doubt the most exciting new feature of SQL Server 2014. We all first learned about this technology when it was announced back at the PASS Summit 2012 conference where it was also mentioned that it will be included in the next major release of SQL Server. Recently at TechEd North America Microsoft finally announced that this new version will be named SQL Server 2014 and that its first CTP will be available to download by the end of June. So after playing with this CTP for a few days I decided to write a few posts mostly focused on the query processing area of the technology.

But perhaps one of the most important concepts we are learning with Hekaton is the fact that even with the new hardware available today, obtaining ten times or more better performance will also require to dramatically change the way data management systems are designed. For example, database management systems were originally designed under the assumption that memory is limited and data should have to reside on disk pages. This assumption is no longer true and currently it is possible to fit entire databases in memory, or at least the most critical tables in an OLTP environment. So taking benefit of this available memory is not just a matter of reading more of the existing disk pages to memory but re-designing data management systems using a different approach to take the most possible benefit of this new hardware.

The best way to start learning about Hekaton is reading Kalen Delaney’s white paper SQL Server In-Memory OLTP Internals Overview for CTP1 and of course, downloading and playing with the CTP1 itself. So assuming you are familiar with the basic concepts in this post I will focus on two of the main objects that Hekaton provides:

1) Memory optimized tables. Hekaton tables are stored in memory, all the time. Obviously they are also kept in disk, but only for durability purposes. These tables provide two new kinds of indexes, hash and range indexes, although only the former kind is available on the first CTP. And you may be surprised that memory optimized tables are compiled into native DLLs when they are created.

2) Natively compiled stored procedures. This new kind of stored procedures are first optimized by the query optimizer, like any regular stored procedure, but are also later translated into C code and compiled into machine code. This process again will create DLLs.

Limitations

Perhaps the main limitation of Hekaton is that memory optimized tables could not be altered, they would need to be dropped and created again with the required changes. This is the case, for example, if you want to add a new column or a new index. Obviously, dropping the table would mean first copying its data to another location, creating a new empty table with the required changes and copying the data back, which would require some downtime for live applications.

You cannot alter natively compiled stored procedures either. Or even recompile them (except in a few limited cases). In addition, as the data on Hekaton tables change, statistics are not automatically updated, you will need to manually update them using the UPDATE STATISTICS statement. Also, after getting updated statistics your existing natively compiled stored procedures cannot take benefit of it. You will have to manually drop and re-create the store procedure.

Example

Let me show you an example of how the technology works, using a copy of AdventureWorks2012 database. First let us create a new database.

CREATE DATABASE Test
ON PRIMARY (NAME = [Test_data],
FILENAME = 'C:\DATA\Test_data.mdf', SIZE=500MB),
FILEGROUP [Test_fg] CONTAINS MEMORY_OPTIMIZED_DATA 
(NAME = [Test_fg], FILENAME = 'C:\DATA\Test_fg')
LOG ON (NAME = [Test_log], Filename='C:\DATA\Test_log.ldf', SIZE=500MB)
COLLATE Latin1_General_100_BIN2;

Now we can create a memory optimized table

CREATE TABLE [TransactionHistoryArchive] (
    [TransactionID] [int] NOT NULL PRIMARY KEY 
        NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000),
    [ProductID] [int] NOT NULL,
    [ReferenceOrderID] [int] NOT NULL,
    [ReferenceOrderLineID] [int] NOT NULL,
    [TransactionDate] [datetime] NOT NULL,
    [TransactionType] [nchar](1) NOT NULL,
    [Quantity] [int] NOT NULL,
    [ActualCost] [money] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)

You cannot copy data directly from a second database to a memory optimized table. So we need to copy the data to a regular table first.

SELECT * INTO temp
FROM AdventureWorks2012.Production.TransactionHistoryArchive

Now we can copy the data to the memory optimized table

INSERT INTO TransactionHistoryArchive
SELECT * FROM temp

Let us look at our first execution plans using T-SQL against our memory optimized table (which by the way is defined as T-SQL using the query interop capabilities).

SELECT * FROM TransactionHistoryArchive
WHERE TransactionID = 8209

Since we defined a hash index on the TransactionId column we get the following plan

image

Now a query using a different column

SELECT * FROM TransactionHistoryArchive
WHERE ProductID = 780

image

The execution plans shows the familiar Index Scan and Index Seek operations but obviously the algorithms for this execution are not the same, this time the table and indexes are in memory and use totally different structures. You can identify differences on these operators by looking at NonClusteredHash after the name of the operator and Storage shown as MemoryOptimized in the properties of the operator as shown next.

image

Let us now create a stored procedure and yes, all the new clauses (like NATIVE_COMPILATION or SCHEMABINDING) are required, leaving any of these out will produce an error

CREATE PROCEDURE test
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english')
SELECT TransactionID, ProductID, ReferenceOrderID FROM dbo.TransactionHistoryArchive
WHERE ProductID = 780
END

Then you can run the procedure by using

EXEC test

Something that you may notice is that executing the native stored procedure does not show any execution plan. At this moment I don’t know if the plan will be visible on a later CTP or the RTM but assuming it won’t, this may be a problem as we would not have an execution plan to troubleshoot. You may say that we can still see the plan using the T-SQL code but keep in mind that data and statistics may have changed since the procedure was created and so it does not mean that the plan you currently get was the same that was compiled to native code. Anyway, we would just have to wait for the next CTPs and see how is this going to work.

UPDATE: I got an update from Jos de Bruijn from Microsoft saying “You are correct that the plan for the native procedure may not be the same as in interpreted T-SQL. Not only can there be differences in statistics, but there are also certain operators we do not support in native procedures (e.g. hash join, hash aggregate). Starting with CTP2, we will be supporting SHOWPLAN_XML (also called the “estimated execution plan” in Management Studio), so you will be able to see the plan that was used to compile the procedure.”

Finally, it is interesting to note that the generated DLLs are not kept on the database but in the file system and you can find them, along with some other intermediate files by looking at the location returned by the following query

SELECT name, description FROM sys.dm_os_loaded_modules
where description = 'XTP Native DLL'

Generated C code is available at that location. A small fragment of the C code created on the previous examples is shown next

#include "hkenggen.h"
#include "hkrtgen.h"
#include "hkgenlib.h"

#define ENABLE_INTSAFE_SIGNED_FUNCTIONS
#include "intsafe.h"

int _fltused = 0;

int memcmp(const void*, const void*, size_t);
void *memcpy(void*, const void*, size_t);
void *memset(void*, int, size_t);

#define offsetof(s,f)   ((size_t)&(((s*)0)->f))

struct hkt_277576027
{
    __int64 hkc_5;
    __int64 hkc_8;
    __int64 hkc_9;
    long hkc_1;
    long hkc_2;
    long hkc_3;
    long hkc_4;
    long hkc_7;
    unsigned short hkvdo[2];
};