Before reading this article, you should have a good understanding of single joins between two tables and be clear on the differences between inner and outer joins. Check out my previous post A Primer on Joins to help you accomplish this.
Have you ever looked at a query like the one below and wondered how to read it, how the different joins work together, and what aliens on what planet wrote such a thing?
Queries with multiple joins like this one often lead to confusion, such as the one behind this question that I have often heard from students: “There seems to be three tables joined to the Employee table in this query—two are inner joins and the other is an outer join. How can the same table have its non-matching rows eliminated and preserved at the same time in the same query?”
In this article, I will show you that confusions like this one arise from a syntax that encourages us to misunderstand joins, and I’ll offer another way of looking at multiple-join queries that makes questions like the one above melt away.
So, what leads to the confusion? Initially, it might seem that every table that is joined to this query is joined to a previous table; the ON clause suggests this with its references to columns in previous tables.
But this is not what actually happens in a multi-join query, and so looking at things in this way will lead to head-scratching.
So, what does a multi-join query actually do? It actually does something very simple. It performs a series of incremental, single joins between two tables at a time (while this article refers only to tables for simplicity sake, joins can be between tables, views, table valued functions, CTEs, and derived table subqueries). Each single join produces a single derived table (DT) that is then joined to the next table and so on. Like this:
JOIN 1: Inner join between Employee and Contact resulting in a derived table, DT1. Because this is an inner join, rows in Employee are excluded if they don’t match any rows in Contact, and vice-versa.
JOIN 2: Outer join between DT1 and JobCandidate resulting in a derived table, DT2. Because this is a left outer join, all rows in DT1 are preserved.
JOIN 3: Inner join between DT2 and SalesPerson resulting in a derived table, DT3. Because this is an inner join, rows in DT2 are excluded if they don’t match any rows in SalesPerson, and vice-versa.
JOIN 4: Outer join between DT3 and SalesOrderHeader resulting in a derived table, DT4. Because this is a left outer join, all rows in DT3 are preserved.
JOIN 5: Outer join between DT4 and SalesTerritory resulting in a derived table, DT5. Because this is a left outer join, all rows in DT4 are preserved. DT5 is the final result of the query.
So, what about that confusion arising from the ON clause? With this new way of looking at multiple-join queries, we can now see that the proper way to read an ON clause is not that it joins the new table to a single table that came before it in the query! The only join that does that is the first one; all subsequent ones join a new table to the derived table that is a result of all the joins before it. If an ON clause includes a table alias, that is only to identify the column properly to the query. Table aliases are only required when there is ambiguity—when two or more columns have the same name in the derived table that precedes the current join because they came from different tables.
I hope this new way of looking at multiple-join queries helps make it easier and more productive for you to work with joins.