When we write a DML query that is using JOINed tables as data source, first we need to understand the way SQL Server is manipulating the data.
The clauses in the DML query statement are ordered as follows:
- SELECT
- FROM… JOIN
- WHERE
- GROUP BY
- HAVING
- ORDER BY
Behind the scene SQL Server is executing the DML query statement in different order:
- Step 1: FROM… JOIN – build the virtual recordset (VR) from one or multiple data sources (DS)
- Step 2: WHERE – filter the VR
- Step 3: GROUP BY… HAVING – group, aggregate and filter the grouped VR
- Step 4: SELECT – build the resulting VR and create aliases
- Step 5: ORDER BY – order the resulting VR and use the aliases, created in the SELECT statement
These two tables will help your intuition to draw the full picture how the tables are JOINed in Step 1:
As explained in Step 1, SQL Server creates VR, by JOINing the data sources. We JOIN the tables on column Col1. The resulting VR is:
Let’s write a simple query:
1 2 3 4 5 6 7 8 |
SELECT L.* , R.* FROM LeftDataSource AS L JOIN RightDataSource AS R ON L.Col1 = R.Col1; GO |
The keyword JOIN in the FROM clause determines how the tables are joined and which rows will be included in the virtual recordset, created in Step 1.
The different types of JOIN, create different VR:
We’ll modify the query above, by changing the JOIN type to demonstrate the difference between the different types of JOIN.
JOIN (INNER JOIN)
INNER JOIN is inner, because it creates a VR that contains only the rows that meet the join criteria(s) – the rows that match on the linked columns (Col1).
This query:
1 2 3 4 5 6 7 8 |
SELECT L.* , R.* FROM LeftDataSource AS L JOIN RightDataSource AS R ON L.Col1 = R.Col1; GO |
The keywords INNER and OUTER are optional and can be omitted.
will create VR that contains only the rows where the value in Col1 matches:
LEFT JOIN (LEFT OUTER JOIN)
OUTER JOIN is outer, because it creates VR, that contains more rows than the rows that meet the JOIN condition(s) in the ON clause.
By using the tables above, this query:
1 2 3 4 5 6 7 8 |
SELECT L.* , R.* FROM LeftDataSource AS L LEFT JOIN RightDataSource AS R ON L.Col1 = R.Col1; GO |
will create VR with all the rows from the left data source and only the rows from the right data source that meet the JOIN condition(s).
The VR fills the rows in the right table that don’t have a match on the JOIN condition(s) with NULL.
RIGHT JOIN (RIGHT OUTER JOIN)
The same logic as LEFT JOIN, but reversed – the VR contains all the rows from the right data source and the matching rows from the left data source.
1 2 3 4 5 6 7 8 |
SELECT L.* , R.* FROM LeftDataSource AS L RIGHT JOIN RightDataSource AS R ON L.Col1 = R.Col1; GO |
FULL JOIN (FULL OUTER JOIN)
Creates a VR with all the rows from both data sources. The not matching values in the left and the right source are represented with NULL. It is a combined LEFT JOIN and RIGHT JOIN.
1 2 3 4 5 6 7 8 |
SELECT L.* , R.* FROM LeftDataSource AS L FULL JOIN RightDataSource AS R ON L.Col1 = R.Col1; GO |
LEFT (RIGHT) JOIN… WHERE
Let’s combine (LEFT, RIGHT) JOIN and WHERE clauses to select the rows from the left data source, that don’t have a match in the right data source (LEFT JOIN WHERE RightRecordset IS NULL):
1 2 3 4 5 6 7 8 9 |
SELECT L.* , R.* FROM LeftDataSource AS L LEFT JOIN RightDataSource AS R ON L.Col1 = R.Col1 WHERE R.Col1 IS NULL; GO |
The same logic can be used with RIGHT JOIN to select the rows from the right data source, that don’t have a match in the left data source (RIGHT JOIN WHERE LeftRecordset IS NULL):
1 2 3 4 5 6 7 8 9 |
SELECT L.* , R.* FROM LeftDataSource AS L RIGHT JOIN RightDataSource AS R ON L.Col1 = R.Col1 WHERE L.Col1 IS NULL; GO |
We can also select the rows from both data sources, that don’t have a match (FULL JOIN WHERE LeftRecordset OR RightRecordset IS NULL). This logic is FULL JOIN except INNER JOIN.
1 2 3 4 5 6 7 8 9 10 11 |
SELECT L.* , R.* FROM LeftDataSource AS L FULL JOIN RightDataSource AS R ON L.Col1 = R.Col1 WHERE L.Col1 IS NULL OR R.Col1 IS NULL; GO |
CROSS JOIN
Creates a VR, containing all the combinations of the rows in both data sources, (a.k.a. Cartesian Product).
1 2 3 4 5 6 7 |
SELECT L.* , R.* FROM LeftDataSource AS L CROSS JOIN RightDataSource AS R; GO |
No ON keyword
The above lesson and many more are collected in my book Learn SQL Server Intuitively. Transact-SQL: The Solid Basics. You can find it here.
Keep it simple :-)