Benjamin Nevarez Rotating Header Image

The Query Optimizer and Contradiction Detection

As covered in my book Inside the SQL Server Query Optimizer, contradiction detection is a query rewrite (or tree rewrite?) performed at the simplification phase of the optimization process in which query contradictions are detected and removed. Since these parts of the query are not executed at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. For example, the Query Optimizer may know that no records can satisfy a predicate even before touching any page of data. A contradiction may be related to a check constraint, or may be related to the way the query is written. I will show you examples of both cases next.

First, I need to find a table with a check constraint in AdventureWorks and, handily, the Employee table has the following check constraint definition:

([VacationHours]>=(-40) AND [VacationHours]<=(240))

This check constraint makes sure that the number of vacation hours is a number between –40 and 240, so when I request

SELECT * FROM HumanResources.Employee
WHERE VacationHours > 80

… SQL Server uses a Clustered Index Scan operator, as shown next

clip_image002However, if I request all of the employees with more than 300 vacation hours then, because of this check constraint, the Query Optimizer must immediately know that no records qualify for predicate. Run the following code

SELECT * FROM HumanResources.Employee
WHERE VacationHours > 300

As expected, the query will return no records, but this time it will show the following execution planclip_image004Note that, this time, instead of a Clustered Index Scan, SQL Server is using a Constant Scan operator. Since there is no need to access the table at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. Now, let’s see what happens if I disable the check constraint

ALTER TABLE HumanResources.Employee NOCHECK CONSTRAINT CK_Employee_VacationHours

This time, running the last query once again uses a Clustered Index Scan operator, as the Query Optimizer can no longer use the check constraint to guide its decisions. Don’t forget to enable the constraint again by running the following statement:

ALTER TABLE HumanResources.Employee WITH CHECK CHECK CONSTRAINT
CK_Employee_VacationHours

The second type of contradiction case is when the query itself explicitly contains a contradiction. Take a look at the next query

SELECT * FROM HumanResources.Employee
WHERE ManagerID > 10 AND ManagerID < 5

In this case there is no check constraint involved; both predicates are valid and each will individually return records, but they contradict each other when they are run together. As a result, the query returns no records and the plan shows again a Constant Scan operator similar to the plan shown previously. This may just look like a badly written query, but remember that some predicates may already be included in, for example, view definitions, and the developer of the query may be unaware of those. For example, in our last query, a view may include the predicate ManagerID > 10 and a developer may call this view using the predicate ManagerID < 5. Since both predicates contradict each other a Constant Scan operator will be used again instead.

About the author

Benjamin Nevarez Benjamin Nevarez is a database professional based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of three books, “High Performance SQL Server”, “SQL Server 2014 Query Tuning & Optimization” and “Inside the SQL Server Query Optimizer” and has also coauthored other books including “SQL Server 2012 Internals”. Benjamin has also been a speaker at many SQL Server conferences and events around the world including the PASS Summit, SQL Server Connections and SQLBits. His blog can be found at http://www.benjaminnevarez.com and he can also be reached on twitter at @BenjaminNevarez.

6 Comments

  1. tobi says:

    This is also useful to find out by the query plan that your query contains a logical mistake.

    Having a contradiction in production code seems rare but I found it to be the case relatively often because tools generate suboptimal code.

    I wish the optimizer would also remove redundant joins (joins of the same table having the same conditions and filters). This happens very often with linq 2 sql and entity framework. They really generate a lot of redundant joins.

  2. Benjamin Nevarez says:

    Hi Tobi,

    I am not familiar with LINQ, how those redundant joins look like?

    The Query Optimizer also removes some redundant joins on this simplification phase. A typical example is the Foreign Key Join elimination which occurs when SQL Server can detect that some joins may not be needed, as foreign key constraints are available and only columns of the referencing table are requested. Maybe a topic for a future post.

    Thanks,

    Ben

  3. tobi says:

    This is a connect item that describes the issue and contains a working repo: http://connect.microsoft.com/SQLServer/feedback/details/641366/query-optimizer-should-remove-redundant-joins-from-queries

    I see this _all the time_ in practice. In linq you usually don’t write joins explicity but you use navigation properties like “Order.Customer”. If you use the same property multiple times, it mostly works out to a single join but sometimes the translation engine gets confused and does not notice that a particular join was already added to the query.

    I believe this problem is common to most OR mappers, but I can confirm for L2S and EF4.

  4. Benjamin Nevarez says:

    Wow, interesting problem, thanks for sharing. Hope any of the involved teams provide a solution to it.

    Regards,

    Ben

  5. […] detection is one of several possible simplifications. Following on an example on my book and this blog post, run the following […]

  6. GulliMeel says:

    This is good but there are certain other things which are not handled by the optimizer properly.Like you have an idenetity column and a datetime column then there is a correlation between these. E.g.
    –Create a table called mysoh based on the table sales.salesorderheader

    if OBJECT_ID(‘mysoh’) is not null
    drop table mysoh
    go

    select * into mysoh from Sales.SalesOrderHeader
    go

    –Create a clustered index on SalesOrderId

    alter table mysoh add constraint pk_mysoh_salesorderid primary key (SalesOrderId)
    go

    –Create a non clustered index on OrderDate column

    create nonclustered index idx_myssoh_orderdate on mysoh(OrderDate)
    go

    update statistics mysoh with fullscan
    go

    Now run the following query.
    select * from mysoh
    where
    1=1
    and OrderDate >= cast (‘20010701’ as datetime)
    and Orderdate = cast (‘20010701’ as datetime)
    and Orderdate <= cast ('20010731' as datetime)
    –select @MinSalesOrderId,@MaxSalesOrderId
    select * from mysoh where SalesOrderID between @MinSalesOrderId and @MaxSalesOrderId

    Because we know that the data is continous. This is just 10 IO's.

    Optimizer could be little bit more smart to know this so that first it does index seek on orderdate.Spool the data and then range scan the clustered index starting from the min value to max value to avoid unncessary key lookups.

Leave a Reply

Your email address will not be published. Required fields are marked *