Benjamin Nevarez Rotating Header Image

Optimizing Join Orders

The SQL Server Query Optimizer needs to take two important decisions regarding joins: the selection of a join order and the choice of a join algorithm. The selection of a join algorithm is the choice between a nested loops join, merge join or hash join operator, but I will leave that topic for some other post. In this post I will talk about join orders.

A join combines records from two tables based on some common information. But since a join works with only two tables at a time, a query requesting data from n tables must be executed as a sequence of n – 1 joins. Although the results of a query are the same regardless of the join order, the order in which the tables are joined greatly influences the cost and performance of a query. The Query Optimizer must find the optimal sequence of joins between the tables used in the query. Finding the optimal join order is one of the most difficult problems in query optimization and one that has been subject of extensive research since the seventies.

This basically means that the Query Optimizer defines the join order; it is not defined by your query. To see an example run the following query against the AdventureWorks database and choose to display the execution plan by clicking Include Actual Execution Plan.

SELECT FirstName, LastName

FROM Person.Contact AS C

    JOIN Sales.Individual AS I

        ON C.ContactID = I.ContactID

    JOIN Sales.Customer AS Cu

        ON I.CustomerID = Cu.CustomerID

WHERE Cu.CustomerType = ‘I’

It will show the following execution plan

clip_image002

Note that the Query Optimizer is not using the same join order specified in the query; it found a more efficient one. The query asks to join the Contact table with the Individual table, and then join the result with the Customer table, but if you inspect the three clustered index scan operators, the plan is first joining the Customer and Individual tables and then joining the result with the Contact table.

The problem of finding an efficient join order for a query is difficult because of the number of possible permutations that the Query Optimizer needs to analyze. Because of the commutative and associative properties of joins there could be many different possible join orders in a query and this number increases exponentially with the number of tables. In fact, with just a handful of tables the number of possible join orders could be in the thousands or even millions. Obviously, it is impossible for any query optimizer to look at all those combinations: it would take too long.

Queries are represented as trees in the query processor and some names like left-deep, right-deep and bushy trees are commonly used to identify the shapes of the order of joins in these trees. Left-deep and bushy trees for a join of four tables are shown in the next figure.

 clip_image004

For example, the left-deep tree shown could be JOIN( JOIN( JOIN(A, B), C), D) and the bushy tree could be JOIN(JOIN(A, B), JOIN(C, D)). Left-deep trees are also called linear trees or linear processing trees. The set of bushy trees includes the sets of both left-deep and right-deep trees. The following table lists the number of possible join orders considering left-deep and bushy trees.

Tables

Left-Deep Trees

Bushy Trees

1

1

1

2

2

2

3

6

12

4

24

120

5

120

1,680

6

720

30,240

7

5,040

665,280

8

40,320

17,297,280

9

362,880

518,918,400

10

3,628,800

17,643,225,600

11

39,916,800

670,442,572,800

12

479,001,600

28,158,588,057,600

The number of left-deep trees is calculated as n! or n factorial, where n is the number of tables in the relation. The factorial is the product of all the positive integers less than or equal to n. For example, 5! or 5 factorial is 5 x 4 x 3 x 2 x 1 or 120. Calculating the number of bushy tress is more complicated and can be calculated as (2n – 2)!/(n – 1)!.

So how does the Query Optimizer analyze all these possible join orders? The answer is: it does not. Analyzing all the possible join orders for a complex query could take a long time so the Query Optimizer must find a balance between the optimization time and the quality of the resulting plan. The Query Optimizer may not perform an exhaustive search and instead uses some heuristics to guide the search process. For example, a common heuristic is to avoid considering bushy tress during optimization, but I will leave some of these details for some other post.

About the author

Benjamin Nevarez Benjamin Nevarez is a database professional based in Los Angeles, CA, and author of "Inside the SQL Server Query Optimizer". He has also contributed to other SQL Server books including "SQL Server 2012 Internals". Benjamin has 20 years of experience with relational databases and has been working with SQL Server since version 6.5. He holds a Master’s Degree in Computer Science and has been a speaker at many SQL Server conferences, including the PASS Summit and SQL Server Connections. Benjamin’s blog is at http://www.benjaminnevarez.com, can be reached by e-mail at admin at benjaminnevarez dot com, on twitter at @BenjaminNevarez and on .

4 Comments

  1. [...] This post was mentioned on Twitter by Benjamin Nevarez, Ben Nevarez. Ben Nevarez said: New blog post: Optimizing Join Orders http://www.benjaminnevarez.com/2010/06/optimizing-join-orders [...]

  2. Geri Reshef says:

    Does the Clustered Index Scan of customer table includes the filter (WHERE Cu.CustomerType = ‘I’)?

  3. Benjamin Nevarez says:

    Yes, the clustered index scan evaluates the predicate on CustomerType. You can see it on the predicate section of the clustered index scan properties that shows

    Predicate [AdventureWorks].[Sales].[Customer].[CustomerType] as [Cu].[Customertype]=N’I’

    Thanks,

    Ben

  4. [...] Benjamin Nevarez: Optimizing Join Orders [...]

Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>