The FULL JOIN operation can sometimes be challenging to understand. For example, consider the task of retrieving employee_id values where either name or salary is missing (i.e., data exists in one table but not in the other). This can be useful, for instance, when comparing rows between two documents.
Example Data
Employees Table:
| employee_id | name |
|---|---|
| 2 | Crew |
| 4 | Haven |
| 5 | Kristian |
Salaries Table:
| employee_id | salary |
|---|---|
| 5 | 76071 |
| 1 | 22517 |
| 4 | 63539 |
Expected Result:
| employee_id |
|---|
| 1 |
| 2 |
SQL Queries
Using FULL JOIN
SELECT *
FROM Employees e
FULL OUTER JOIN Salaries s ON e.employee_id = s.employee_id
WHERE e.name IS NULL OR s.salary IS NULL;
Result:
| EMPLOYEE_ID | NAME | EMPLOYEE_ID | SALARY |
|---|---|---|---|
| null | null | 1 | 22517 |
| 2 | Crew | null | null |
Using COALESCE
To retrieve employee_id from both tables, you can use COALESCE:
SELECT COALESCE(e.employee_id, s.employee_id) AS employee_id
FROM Employees e
FULL OUTER JOIN Salaries s ON e.employee_id = s.employee_id
WHERE e.name IS NULL OR s.salary IS NULL;
Using UNION
Another approach is to first retrieve all possible employee_id values from both tables and then perform the join:
WITH emps AS (
SELECT employee_id FROM Employees
UNION
SELECT employee_id FROM Salaries
)
SELECT e.employee_id
FROM emps e
LEFT JOIN Employees e2 ON e.employee_id = e2.employee_id
LEFT JOIN Salaries s ON e.employee_id = s.employee_id
WHERE s.salary IS NULL OR e2.name IS NULL
ORDER BY e.employee_id ASC;
Advantages and Disadvantages of Each Approach
FULL JOIN
Advantages:
- Simpler query structure.
- Suitable for cases where you need to merge data from two tables and immediately filter rows with missing values.
Disadvantages:
- May be less performant on large datasets, as it merges all rows from both tables.
- Requires additional processing if you need only unique identifiers.
UNION
Advantages:
- Flexibility: allows you to first merge unique identifiers and then perform additional joins.
- Can be more performant if the tables contain a lot of data but only unique values need to be processed.
Disadvantages:
- More complex query structure.
- Requires extra steps for merging and filtering data.
Important Notes
- The choice of approach depends on the task and the size of the data. For small tables, the performance difference may be negligible.
- If your database supports query optimizations such as indexing, this can significantly impact execution speed.
- When working with
NULL, remember thatNULL = NULLalways evaluates toFALSE. Use functions likeCOALESCEorIS NULLfor proper handling.