Comprehensive Guide to SQL JOINs: Understanding NULL Behavior and Join Types

Description

This guide provides a comprehensive overview of SQL JOIN operations, with special attention to NULL value behavior in Oracle databases. Through practical examples using two test tables, we explore all major JOIN types and their specific behaviors when dealing with NULL values.


When working with relational databases, you'll inevitably and constantly need to work with joins. Beyond understanding the joins themselves, it's crucial to understand NULL behavior. In Oracle databases, NULL should be understood as the absence of a value. Any comparison with NULL will result in NULL (undefined value).

It might be worthwhile to test this behavior on your database (there are versions where some databases return True when comparing NULL with NULL).

Main Types of JOIN in SQL

Let's examine the behavior of different join types in tables containing NULL values using two example tables: table_a and table_b.

Creating and Populating Tables

DROP TABLE table_a;
CREATE TABLE table_a (
    id INT,
    value VARCHAR(10)
);

INSERT INTO table_a VALUES (1, 'A');
INSERT INTO table_a VALUES (2, NULL);
INSERT INTO table_a VALUES (3, 'B');
INSERT INTO table_a VALUES (4, NULL);
INSERT INTO table_a VALUES (5, 'C');
INSERT INTO table_a VALUES (6, NULL);
INSERT INTO table_a VALUES (7, 'D');
INSERT INTO table_a VALUES (8, NULL);
INSERT INTO table_a VALUES (9, 'E');
INSERT INTO table_a VALUES (10, NULL);
-- Add a second 'B' value
INSERT INTO table_a VALUES (11, 'B');

DROP TABLE table_b;
CREATE TABLE table_b (
    id INT,
    value VARCHAR(10)
);

INSERT ALL
  INTO table_b VALUES (1, 'A')
  INTO table_b VALUES (2, NULL)
  INTO table_b VALUES (3, NULL)
  INTO table_b VALUES (4, 'B')
  INTO table_b VALUES (5, NULL)
  INTO table_b VALUES (6, 'C')
  INTO table_b VALUES (7, NULL)
  INTO table_b VALUES (8, NULL)
  INTO table_b VALUES (9, 'W')
  INTO table_b VALUES (10, NULL)
  INTO table_b VALUES (11, NULL)
  INTO table_b VALUES (12, 'V')
  -- Add a second 'C' value
  INTO table_b VALUES (13, 'C')
SELECT * FROM dual;

Table Contents

table_a:

id value
1 A
2 NULL
3 B
4 NULL
5 C
6 NULL
7 D
8 NULL
9 E
10 NULL
11 B

table_b:

id value
1 A
2 NULL
3 NULL
4 B
5 NULL
6 C
7 NULL
8 NULL
9 W
10 NULL
11 NULL
12 V
13 C

JOIN Types

1. INNER JOIN

Description: Returns only rows that have matches in both tables based on the join condition.

SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
INNER JOIN table_b b ON a.value = b.value;

Result:

a_id a_value b_id b_value
1 A 1 A
3 B 4 B
5 C 6 C
5 C 13 C
11 B 4 B

2. LEFT JOIN

Description: Returns all rows from the left table (table_a) and matching rows from the right table (table_b). If there's no match, values from the right table will be NULL.

When dealing with joins, it's important to understand that when there's a match, all matching values from the second table will be returned. Initially, this was difficult for me to get used to as an active Excel VLOOKUP user, where I expected only one value to be returned.

This type of join is most commonly used when you need to guarantee all rows from the source table in the output and check for matches in the second table by key.

SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
LEFT JOIN table_b b ON a.value = b.value;

Result (abbreviated):

a_id a_value b_id b_value
1 A 1 A
2 NULL NULL NULL
3 B 4 B
4 NULL NULL NULL
5 C 6 C
5 C 13 C
6 NULL NULL NULL
7 D NULL NULL
8 NULL NULL NULL
9 E NULL NULL
10 NULL NULL NULL
11 B 4 B

3. RIGHT JOIN

Description: Returns all rows from the right table (table_b) and matching rows from the left table (table_a). If there's no match, values from the left table will be NULL.

SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
RIGHT JOIN table_b b ON a.value = b.value;

Result (abbreviated):

a_id a_value b_id b_value
1 A 1 A
NULL NULL 2 NULL
NULL NULL 3 NULL
3 B 4 B
11 B 4 B
NULL NULL 5 NULL
5 C 6 C
5 C 13 C
NULL NULL 7 NULL
NULL NULL 8 NULL
NULL NULL 9 W
NULL NULL 10 NULL
NULL NULL 11 NULL
NULL NULL 12 V

These two queries are equivalent in result (if aliases are correctly specified):

SELECT *
FROM table_a a
RIGHT JOIN table_b b ON a.value = b.value;

and

SELECT *
FROM table_b b
LEFT JOIN table_a a ON a.value = b.value;

Why?
RIGHT JOIN and LEFT JOIN are mirror operations.
RIGHT JOIN returns all rows from the right table (table_b), while LEFT JOIN returns all rows from the left table (table_b in this case).
If you swap the tables and join type, the result will be identical in content (though column order may differ).

Conclusion:
a RIGHT JOIN b ON ...b LEFT JOIN a ON ...

4. FULL OUTER JOIN

Description: Returns all rows from both tables. If there's no match, values from the missing table will be NULL. It's important to note that NULL in one table is not equal to NULL in another table.

FULL OUTER JOIN essentially returns the union of LEFT JOIN and RIGHT JOIN results, but with one important difference:
- If the same row appears in both LEFT JOIN and RIGHT JOIN (i.e., there's a match by key), then in the FULL OUTER JOIN result, this row will appear only once.

If you simply write:

SELECT ... FROM a LEFT JOIN b ON ...
UNION
SELECT ... FROM a RIGHT JOIN b ON ...

— this will give the same result as FULL OUTER JOIN, if you use UNION, not UNION ALL.
UNION removes duplicates, while UNION ALL doesn't.

Important:
- Some databases (e.g., MySQL) don't have FULL OUTER JOIN, and this UNION approach is used as a replacement.
- In standard SQL, it's better to use FULL OUTER JOIN if it's supported.

SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
FULL OUTER JOIN table_b b ON a.value = b.value;

Result (abbreviated):

a_id a_value b_id b_value
1 A 1 A
3 B 4 B
11 B 4 B
5 C 6 C
5 C 13 C
2 NULL NULL NULL
4 NULL NULL NULL
6 NULL NULL NULL
7 D NULL NULL
8 NULL NULL NULL
9 E NULL NULL
10 NULL NULL NULL
NULL NULL 2 NULL
NULL NULL 3 NULL
NULL NULL 5 NULL
NULL NULL 7 NULL
NULL NULL 8 NULL
NULL NULL 9 W
NULL NULL 10 NULL
NULL NULL 11 NULL
NULL NULL 12 V

5. CROSS JOIN

Description: Cartesian product — each row from the first table is joined with each row from the second.

SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
CROSS JOIN table_b b;

Result:
The result will have 11 (rows in table_a) × 13 (rows in table_b) = 143 rows.
Below is an example of rows with a_id = 1 and rows with a_id = 2 from the first table, to show that the output includes all rows, including NULL:

a_id a_value b_id b_value
1 A 1 A
1 A 2 NULL
1 A 3 NULL
1 A 4 B
1 A 5 NULL
1 A 6 C
1 A 7 NULL
1 A 8 NULL
1 A 9 W
1 A 10 NULL
1 A 11 NULL
1 A 12 V
2 NULL 1 A
2 NULL 2 NULL
2 NULL 3 NULL
2 NULL 4 B
2 NULL 5 NULL
2 NULL 6 C
2 NULL 7 NULL
2 NULL 8 NULL
2 NULL 9 W
2 NULL 10 NULL
2 NULL 11 NULL
2 NULL 12 V

...


6. LEFT JOIN ON 1=1 (Cartesian Product via LEFT JOIN)

Description:
If you use LEFT JOIN with the condition ON 1=1, each row from the left table is joined with each row from the right table, because the join condition is always true. This effectively turns the result into a Cartesian product, similar to CROSS JOIN, BUT if the right table is empty, LEFT JOIN will still return all rows from the left table with NULL values on the right, while CROSS JOIN would return an empty result.

I needed such a JOIN when parsing an XML document that contained equipment models and their properties in a nested structure. For some models, properties were missing, but it was important to be able to get this in the result.

SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
LEFT JOIN table_b b ON 1=1;

Note:
- For non-empty tables, the result is the same as CROSS JOIN.
- Formally, LEFT JOIN ... ON 1=1 and CROSS JOIN are not the same thing, but in practice for two regular tables, the result is identical.

Below is another example where this might be useful.

Example: "Expanding" Properties for Each Row

Task:
You have a products table (products) and a separate table of possible properties (properties). Not all products have values for all properties, but you want to get a complete list of properties for each product (even if values are missing).

Structure:

products
---------
id | name
---|------
1  | Phone
2  | Laptop

properties
-----------
id | property_name
---|--------------
1  | Color
2  | Weight
3  | Size

Your goal:
Get all possible properties for each product, even if values haven't been set yet.

Solution:

SELECT p.id as product_id, p.name, pr.property_name
FROM products p
LEFT JOIN properties pr ON 1=1

Result:
Each product will be "multiplied" by all properties. This is convenient if you want to, for example, prepare a template for entering property values for each product (e.g., in an admin interface).


Conclusion:
LEFT JOIN ... ON 1=1 is useful when you need to "expand" one table across all rows of another, even if there's no direct relationship between them. This is often used for generating templates, reports, or preparing data for further population.


Summary

Using different types of JOIN allows you to flexibly combine data from multiple tables depending on the tasks at hand. It's important to understand how each type of join works to get correct results when working with databases. And it's very important to understand how NULL works in the database.