As covered in my book Inside the SQL Server Query Optimizer, contradiction detection is a query rewrite (or tree rewrite?) performed at the simplification phase of the optimization process in which query contradictions are detected and removed. Since these parts of the query are not executed at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. For example, the Query Optimizer may know that no records can satisfy a predicate even before touching any page of data. A contradiction may be related to a check constraint, or may be related to the way the query is written. I will show you examples of both cases next.
First, I need to find a table with a check constraint in AdventureWorks and, handily, the Employee table has the following check constraint definition:
([VacationHours]>=(-40) AND [VacationHours]<=(240))
This check constraint makes sure that the number of vacation hours is a number between –40 and 240, so when I request
SELECT * FROM HumanResources.Employee WHERE VacationHours > 80
… SQL Server uses a Clustered Index Scan operator, as shown next
However, if I request all of the employees with more than 300 vacation hours then, because of this check constraint, the Query Optimizer must immediately know that no records qualify for predicate. Run the following code
SELECT * FROM HumanResources.Employee WHERE VacationHours > 300
As expected, the query will return no records, but this time it will show the following execution planNote that, this time, instead of a Clustered Index Scan, SQL Server is using a Constant Scan operator. Since there is no need to access the table at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. Now, let’s see what happens if I disable the check constraint
ALTER TABLE HumanResources.Employee NOCHECK CONSTRAINT CK_Employee_VacationHours
This time, running the last query once again uses a Clustered Index Scan operator, as the Query Optimizer can no longer use the check constraint to guide its decisions. Don’t forget to enable the constraint again by running the following statement:
ALTER TABLE HumanResources.Employee WITH CHECK CHECK CONSTRAINT CK_Employee_VacationHours
The second type of contradiction case is when the query itself explicitly contains a contradiction. Take a look at the next query
SELECT * FROM HumanResources.Employee WHERE ManagerID > 10 AND ManagerID < 5
In this case there is no check constraint involved; both predicates are valid and each will individually return records, but they contradict each other when they are run together. As a result, the query returns no records and the plan shows again a Constant Scan operator similar to the plan shown previously. This may just look like a badly written query, but remember that some predicates may already be included in, for example, view definitions, and the developer of the query may be unaware of those. For example, in our last query, a view may include the predicate ManagerID > 10 and a developer may call this view using the predicate ManagerID < 5. Since both predicates contradict each other a Constant Scan operator will be used again instead.
This is also useful to find out by the query plan that your query contains a logical mistake.
Having a contradiction in production code seems rare but I found it to be the case relatively often because tools generate suboptimal code.
I wish the optimizer would also remove redundant joins (joins of the same table having the same conditions and filters). This happens very often with linq 2 sql and entity framework. They really generate a lot of redundant joins.
Hi Tobi,
I am not familiar with LINQ, how those redundant joins look like?
The Query Optimizer also removes some redundant joins on this simplification phase. A typical example is the Foreign Key Join elimination which occurs when SQL Server can detect that some joins may not be needed, as foreign key constraints are available and only columns of the referencing table are requested. Maybe a topic for a future post.
Thanks,
Ben
This is a connect item that describes the issue and contains a working repo: http://connect.microsoft.com/SQLServer/feedback/details/641366/query-optimizer-should-remove-redundant-joins-from-queries
I see this _all the time_ in practice. In linq you usually don’t write joins explicity but you use navigation properties like “Order.Customer”. If you use the same property multiple times, it mostly works out to a single join but sometimes the translation engine gets confused and does not notice that a particular join was already added to the query.
I believe this problem is common to most OR mappers, but I can confirm for L2S and EF4.
Wow, interesting problem, thanks for sharing. Hope any of the involved teams provide a solution to it.
Regards,
Ben
[…] detection is one of several possible simplifications. Following on an example on my book and this blog post, run the following […]
This is good but there are certain other things which are not handled by the optimizer properly.Like you have an idenetity column and a datetime column then there is a correlation between these. E.g.
–Create a table called mysoh based on the table sales.salesorderheader
if OBJECT_ID(‘mysoh’) is not null
drop table mysoh
go
select * into mysoh from Sales.SalesOrderHeader
go
–Create a clustered index on SalesOrderId
alter table mysoh add constraint pk_mysoh_salesorderid primary key (SalesOrderId)
go
–Create a non clustered index on OrderDate column
create nonclustered index idx_myssoh_orderdate on mysoh(OrderDate)
go
update statistics mysoh with fullscan
go
Now run the following query.
select * from mysoh
where
1=1
and OrderDate >= cast (‘20010701’ as datetime)
and Orderdate = cast (‘20010701’ as datetime)
and Orderdate <= cast ('20010731' as datetime)
–select @MinSalesOrderId,@MaxSalesOrderId
select * from mysoh where SalesOrderID between @MinSalesOrderId and @MaxSalesOrderId
Because we know that the data is continous. This is just 10 IO's.
Optimizer could be little bit more smart to know this so that first it does index seek on orderdate.Spool the data and then range scan the clustered index starting from the min value to max value to avoid unncessary key lookups.
[…] The view explicitly defines the id ranges which exist on each shard. This leverages SQL Server’s contradiction detection to drop remote servers calls from the query, when those ids are not required for the query […]