Comprehensive Guide to SQL JOINs: Understanding NULL Behavior and Join Types
Description
This guide provides a comprehensive overview of SQL JOIN operations, with special attention to NULL value behavior in Oracle databases. Through practical examples using two test tables, we explore all major JOIN types and their specific behaviors when dealing with NULL values.
When working with relational databases, you'll inevitably and constantly need to work with joins. Beyond understanding the joins themselves, it's crucial to understand NULL
behavior. In Oracle databases, NULL
should be understood as the absence of a value. Any comparison with NULL
will result in NULL
(undefined value).
It might be worthwhile to test this behavior on your database (there are versions where some databases return True when comparing NULL
with NULL
).
Main Types of JOIN in SQL
Let's examine the behavior of different join types in tables containing NULL
values using two example tables: table_a
and table_b
.
Creating and Populating Tables
DROP TABLE table_a;
CREATE TABLE table_a (
id INT,
value VARCHAR(10)
);
INSERT INTO table_a VALUES (1, 'A');
INSERT INTO table_a VALUES (2, NULL);
INSERT INTO table_a VALUES (3, 'B');
INSERT INTO table_a VALUES (4, NULL);
INSERT INTO table_a VALUES (5, 'C');
INSERT INTO table_a VALUES (6, NULL);
INSERT INTO table_a VALUES (7, 'D');
INSERT INTO table_a VALUES (8, NULL);
INSERT INTO table_a VALUES (9, 'E');
INSERT INTO table_a VALUES (10, NULL);
-- Add a second 'B' value
INSERT INTO table_a VALUES (11, 'B');
DROP TABLE table_b;
CREATE TABLE table_b (
id INT,
value VARCHAR(10)
);
INSERT ALL
INTO table_b VALUES (1, 'A')
INTO table_b VALUES (2, NULL)
INTO table_b VALUES (3, NULL)
INTO table_b VALUES (4, 'B')
INTO table_b VALUES (5, NULL)
INTO table_b VALUES (6, 'C')
INTO table_b VALUES (7, NULL)
INTO table_b VALUES (8, NULL)
INTO table_b VALUES (9, 'W')
INTO table_b VALUES (10, NULL)
INTO table_b VALUES (11, NULL)
INTO table_b VALUES (12, 'V')
-- Add a second 'C' value
INTO table_b VALUES (13, 'C')
SELECT * FROM dual;
Table Contents
table_a:
id | value |
---|---|
1 | A |
2 | NULL |
3 | B |
4 | NULL |
5 | C |
6 | NULL |
7 | D |
8 | NULL |
9 | E |
10 | NULL |
11 | B |
table_b:
id | value |
---|---|
1 | A |
2 | NULL |
3 | NULL |
4 | B |
5 | NULL |
6 | C |
7 | NULL |
8 | NULL |
9 | W |
10 | NULL |
11 | NULL |
12 | V |
13 | C |
JOIN Types
1. INNER JOIN
Description: Returns only rows that have matches in both tables based on the join condition.
SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
INNER JOIN table_b b ON a.value = b.value;
Result:
a_id | a_value | b_id | b_value |
---|---|---|---|
1 | A | 1 | A |
3 | B | 4 | B |
5 | C | 6 | C |
5 | C | 13 | C |
11 | B | 4 | B |
2. LEFT JOIN
Description: Returns all rows from the left table (table_a
) and matching rows from the right table (table_b
). If there's no match, values from the right table will be NULL
.
When dealing with joins, it's important to understand that when there's a match, all matching values from the second table will be returned. Initially, this was difficult for me to get used to as an active Excel VLOOKUP user, where I expected only one value to be returned.
This type of join is most commonly used when you need to guarantee all rows from the source table in the output and check for matches in the second table by key.
SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
LEFT JOIN table_b b ON a.value = b.value;
Result (abbreviated):
a_id | a_value | b_id | b_value |
---|---|---|---|
1 | A | 1 | A |
2 | NULL | NULL | NULL |
3 | B | 4 | B |
4 | NULL | NULL | NULL |
5 | C | 6 | C |
5 | C | 13 | C |
6 | NULL | NULL | NULL |
7 | D | NULL | NULL |
8 | NULL | NULL | NULL |
9 | E | NULL | NULL |
10 | NULL | NULL | NULL |
11 | B | 4 | B |
3. RIGHT JOIN
Description: Returns all rows from the right table (table_b
) and matching rows from the left table (table_a
). If there's no match, values from the left table will be NULL
.
SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
RIGHT JOIN table_b b ON a.value = b.value;
Result (abbreviated):
a_id | a_value | b_id | b_value |
---|---|---|---|
1 | A | 1 | A |
NULL | NULL | 2 | NULL |
NULL | NULL | 3 | NULL |
3 | B | 4 | B |
11 | B | 4 | B |
NULL | NULL | 5 | NULL |
5 | C | 6 | C |
5 | C | 13 | C |
NULL | NULL | 7 | NULL |
NULL | NULL | 8 | NULL |
NULL | NULL | 9 | W |
NULL | NULL | 10 | NULL |
NULL | NULL | 11 | NULL |
NULL | NULL | 12 | V |
These two queries are equivalent in result (if aliases are correctly specified):
SELECT *
FROM table_a a
RIGHT JOIN table_b b ON a.value = b.value;
and
SELECT *
FROM table_b b
LEFT JOIN table_a a ON a.value = b.value;
Why?
RIGHT JOIN
and LEFT JOIN
are mirror operations.
RIGHT JOIN
returns all rows from the right table (table_b
), while LEFT JOIN
returns all rows from the left table (table_b
in this case).
If you swap the tables and join type, the result will be identical in content (though column order may differ).
Conclusion:
a RIGHT JOIN b ON ...
≡ b LEFT JOIN a ON ...
4. FULL OUTER JOIN
Description: Returns all rows from both tables. If there's no match, values from the missing table will be NULL
. It's important to note that NULL
in one table is not equal to NULL
in another table.
FULL OUTER JOIN
essentially returns the union of LEFT JOIN
and RIGHT JOIN
results, but with one important difference:
- If the same row appears in both LEFT JOIN
and RIGHT JOIN
(i.e., there's a match by key), then in the FULL OUTER JOIN
result, this row will appear only once.
If you simply write:
SELECT ... FROM a LEFT JOIN b ON ...
UNION
SELECT ... FROM a RIGHT JOIN b ON ...
— this will give the same result as FULL OUTER JOIN
, if you use UNION
, not UNION ALL
.
UNION
removes duplicates, while UNION ALL
doesn't.
Important:
- Some databases (e.g., MySQL) don't have FULL OUTER JOIN
, and this UNION
approach is used as a replacement.
- In standard SQL, it's better to use FULL OUTER JOIN
if it's supported.
SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
FULL OUTER JOIN table_b b ON a.value = b.value;
Result (abbreviated):
a_id | a_value | b_id | b_value |
---|---|---|---|
1 | A | 1 | A |
3 | B | 4 | B |
11 | B | 4 | B |
5 | C | 6 | C |
5 | C | 13 | C |
2 | NULL | NULL | NULL |
4 | NULL | NULL | NULL |
6 | NULL | NULL | NULL |
7 | D | NULL | NULL |
8 | NULL | NULL | NULL |
9 | E | NULL | NULL |
10 | NULL | NULL | NULL |
NULL | NULL | 2 | NULL |
NULL | NULL | 3 | NULL |
NULL | NULL | 5 | NULL |
NULL | NULL | 7 | NULL |
NULL | NULL | 8 | NULL |
NULL | NULL | 9 | W |
NULL | NULL | 10 | NULL |
NULL | NULL | 11 | NULL |
NULL | NULL | 12 | V |
5. CROSS JOIN
Description: Cartesian product — each row from the first table is joined with each row from the second.
SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
CROSS JOIN table_b b;
Result:
The result will have 11 (rows in table_a
) × 13 (rows in table_b
) = 143 rows.
Below is an example of rows with a_id = 1
and rows with a_id = 2
from the first table, to show that the output includes all rows, including NULL
:
a_id | a_value | b_id | b_value |
---|---|---|---|
1 | A | 1 | A |
1 | A | 2 | NULL |
1 | A | 3 | NULL |
1 | A | 4 | B |
1 | A | 5 | NULL |
1 | A | 6 | C |
1 | A | 7 | NULL |
1 | A | 8 | NULL |
1 | A | 9 | W |
1 | A | 10 | NULL |
1 | A | 11 | NULL |
1 | A | 12 | V |
2 | NULL | 1 | A |
2 | NULL | 2 | NULL |
2 | NULL | 3 | NULL |
2 | NULL | 4 | B |
2 | NULL | 5 | NULL |
2 | NULL | 6 | C |
2 | NULL | 7 | NULL |
2 | NULL | 8 | NULL |
2 | NULL | 9 | W |
2 | NULL | 10 | NULL |
2 | NULL | 11 | NULL |
2 | NULL | 12 | V |
...
6. LEFT JOIN ON 1=1 (Cartesian Product via LEFT JOIN)
Description:
If you use LEFT JOIN
with the condition ON 1=1
, each row from the left table is joined with each row from the right table, because the join condition is always true. This effectively turns the result into a Cartesian product, similar to CROSS JOIN
, BUT if the right table is empty, LEFT JOIN
will still return all rows from the left table with NULL values on the right, while CROSS JOIN
would return an empty result.
I needed such a JOIN when parsing an XML document that contained equipment models and their properties in a nested structure. For some models, properties were missing, but it was important to be able to get this in the result.
SELECT a.id as a_id, a.value as a_value, b.id as b_id, b.value as b_value
FROM table_a a
LEFT JOIN table_b b ON 1=1;
Note:
- For non-empty tables, the result is the same as CROSS JOIN
.
- Formally, LEFT JOIN ... ON 1=1
and CROSS JOIN
are not the same thing, but in practice for two regular tables, the result is identical.
Below is another example where this might be useful.
Example: "Expanding" Properties for Each Row
Task:
You have a products table (products
) and a separate table of possible properties (properties
). Not all products have values for all properties, but you want to get a complete list of properties for each product (even if values are missing).
Structure:
products
---------
id | name
---|------
1 | Phone
2 | Laptop
properties
-----------
id | property_name
---|--------------
1 | Color
2 | Weight
3 | Size
Your goal:
Get all possible properties for each product, even if values haven't been set yet.
Solution:
SELECT p.id as product_id, p.name, pr.property_name
FROM products p
LEFT JOIN properties pr ON 1=1
Result:
Each product will be "multiplied" by all properties. This is convenient if you want to, for example, prepare a template for entering property values for each product (e.g., in an admin interface).
Conclusion:
LEFT JOIN ... ON 1=1
is useful when you need to "expand" one table across all rows of another, even if there's no direct relationship between them. This is often used for generating templates, reports, or preparing data for further population.
Summary
Using different types of JOIN allows you to flexibly combine data from multiple tables depending on the tasks at hand. It's important to understand how each type of join works to get correct results when working with databases. And it's very important to understand how NULL
works in the database.