The Intuitive way to understand the JOINs in SQL Server


When we write a DML query that is using JOINed tables as data source, first we need to understand the way SQL Server is manipulating the data.

The clauses in the DML query statement are ordered as follows:

  1. SELECT
  2. FROMJOIN
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY

Behind the scene SQL Server is executing the DML query statement in different order:

  • Step 1: FROMJOIN – build the virtual recordset (VR) from one or multiple data sources (DS)
  • Step 2: WHERE – filter the VR
  • Step 3: GROUP BYHAVING – group, aggregate and filter the grouped VR
  • Step 4: SELECT – build the resulting VR and create aliases
  • Step 5: ORDER BY – order the resulting VR and use the aliases, created in the SELECT statement

DML Statements Execution Logic

These two tables will help your intuition to draw the full picture how the tables are JOINed in Step 1:

Source Tables

As explained in Step 1, SQL Server creates VR, by JOINing the data sources. We JOIN the tables on column Col1. The resulting VR is:

JOINed Data Sources

Let’s write a simple query:

The keyword JOIN in the FROM clause determines how the tables are joined and which rows will be included in the virtual recordset, created in Step 1.

The different types of JOIN, create different VR:

JOIN Types

We’ll modify the query above, by changing the JOIN type to demonstrate the difference between the different types of JOIN.

JOIN (INNER JOIN)

INNER JOIN is inner, because it creates a VR that contains only the rows that meet the join criteria(s) – the rows that match on the linked columns (Col1).

This query:

The keywords INNER and OUTER are optional and can be omitted.

will create VR that contains only the rows where the value in Col1 matches:

INNER JOIN resulting VR

LEFT JOIN (LEFT OUTER JOIN)

OUTER JOIN is outer, because it creates VR, that contains more rows than the rows that meet the JOIN condition(s) in the ON clause.

By using the tables above, this query:

will create VR with all the rows from the left data source and only the rows from the right data source that meet the JOIN condition(s).

The VR fills the rows in the right table that don’t have a match on the JOIN condition(s) with NULL.

LEFT JOIN resulting VR

RIGHT JOIN (RIGHT OUTER JOIN)

The same logic as LEFT JOIN, but reversed – the VR contains all the rows from the right data source and the matching rows from the left data source.

INNER JOIN Resulting VR

FULL JOIN (FULL OUTER JOIN)

Creates a VR with all the rows from both data sources. The not matching values in the left and the right source are represented with NULL. It is a combined LEFT JOIN and RIGHT JOIN.

FULL JOIN Resulting VR

LEFT (RIGHT) JOINWHERE

Let’s combine (LEFT, RIGHT) JOIN and WHERE clauses to select the rows from the left data source, that don’t have a match in the right data source (LEFT JOIN WHERE RightRecordset IS NULL):

LEFT JOIN WHERE {RightRecordset} IS NULL

The same logic can be used with RIGHT JOIN to select the rows from the right data source, that don’t have a match in the left data source (RIGHT JOIN WHERE LeftRecordset IS NULL):

RIGHT JOIN WHERE {LeftRecordset} IS NULL

We can also select the rows from both data sources, that don’t have a match (FULL JOIN WHERE LeftRecordset OR RightRecordset IS NULL). This logic is FULL JOIN except INNER JOIN.

FULL JOIN WHERE LeftRecordset OR RightRecordset IS NULL

CROSS JOIN

Creates a VR, containing all the combinations of the rows in both data sources, (a.k.a. Cartesian Product).

No ON keyword

INNER JOIN Result

The above lesson and many more are collected in my book Learn SQL Server Intuitively. Transact-SQL: The Solid Basics. You can support its release and stay in touch on KickStarter.

Keep it simple :-)

Leave a comment

Your email address will not be published. Required fields are marked *