Some of the questions I’ve been asked sometimes are which sources I researched to write my Query Optimizer book and which research papers can I recommend to learn more about query optimization. Since I got asked about it again at the Tampa SQLSaturday last week, I wrote this short article on my flight back to Los Angeles to discuss this topic.
But first a warning: these papers are usually more complicated than the SQL Server documentation, books or blogs we read every day and, in some cases, may require a strong computer science background to understand them. In addition, there are dozens or even hundreds of these articles, covering more than 40 years of query optimization research. Although I cannot list all the ones I have read I can definitely give you a way to get started so you can continue with the topics that may interest you.
Research papers reference other papers in the text and you can find the referenced paper details at the end of each article, so if you are interested in one particular area you can go and read that paper directly. By following other listed sources, which will also have additional references, you could find an almost unlimited source of information.
Although these papers usually focus on a specific area or research problem you can get started by reading a few articles which are a more general overview before trying to read more specific topics. Some of these papers to get started are:
An Overview of Query Optimization in Relational Systems by Surajit Chaudhuri
Query Optimization by Yannis E. Ioannidis
An Overview of Data Warehousing and OLAP Technology by Surajit Chaudhuri, Umeshwar Dayal
The query optimizer research paper which started it all
Access Path Selection in a Relational Database Management System by Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price
By following references on those and other similar papers you can find dozens of articles which would be impossible to list here, but just to give you three examples:
Optimizing Join Orders by Michael Steinbrunn, Guido Moerkotte, Alfons Kemper
An Overview of Cost-based Optimization of Queries with Aggregates by Surajit Chaudhuri
Counting, Enumerating, and Sampling of Execution Plans in a Cost-Based Query Optimizer by Florian Waas, Cesar Galindo-Legaria
Some of these papers may be SQL Server related:
Query Processing for SQL Updates by Cesar Galindo-Legaria, Stefano Stefani, Florian Waas
Self-Tuning Database Systems: A Decade of Progress by Surajit Chaudhuri
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server by Surajit Chaudhuri, Vivek Narasayya
SQL Server implemented its own cost-based query optimizer based on the Cascades Framework, when its database engine was re-architected for the release of SQL Server 7.0. Cascades is also based on other previous research work: Volcano and Exodus. You can read about these research projects here:
The Cascades Framework for Query Optimization by Goetz Graefe
Volcano – An Extensible and Parallel Query Evaluation System by Goetz Graefe
The EXODUS Optimizer Generator by Goetz Graefe, David J. DeWitt
Finally, in this post I covered query optimization papers but obviously you can also find information on other areas of database research as well.