Using FULL JOIN to Query Two Tables

The FULL JOIN operation can sometimes be challenging to understand. For example, consider the task of retrieving employee_id values where either name or salary is missing (i.e., data exists in one table but not in the other). This can be useful, for instance, when comparing rows between two documents.

Example Data

Employees Table:

employee_id	name
2	Crew
4	Haven
5	Kristian

Salaries Table:

employee_id	salary
5	76071
1	22517
4	63539

Expected Result:

employee_id
1
2

SQL Queries

Using FULL JOIN

SELECT *
FROM Employees e
FULL OUTER JOIN Salaries s ON e.employee_id = s.employee_id
WHERE e.name IS NULL OR s.salary IS NULL;

Result:

EMPLOYEE_ID	NAME	EMPLOYEE_ID	SALARY
null	null	1	22517
2	Crew	null	null

Using COALESCE

To retrieve employee_id from both tables, you can use COALESCE:

SELECT COALESCE(e.employee_id, s.employee_id) AS employee_id
FROM Employees e
FULL OUTER JOIN Salaries s ON e.employee_id = s.employee_id
WHERE e.name IS NULL OR s.salary IS NULL;

Using UNION

Another approach is to first retrieve all possible employee_id values from both tables and then perform the join:

WITH emps AS (
  SELECT employee_id FROM Employees
  UNION
  SELECT employee_id FROM Salaries
)
SELECT e.employee_id
FROM emps e
LEFT JOIN Employees e2 ON e.employee_id = e2.employee_id
LEFT JOIN Salaries s ON e.employee_id = s.employee_id
WHERE s.salary IS NULL OR e2.name IS NULL
ORDER BY e.employee_id ASC;

Advantages and Disadvantages of Each Approach

FULL JOIN

Advantages:
- Simpler query structure.
- Suitable for cases where you need to merge data from two tables and immediately filter rows with missing values.

Disadvantages:
- May be less performant on large datasets, as it merges all rows from both tables.
- Requires additional processing if you need only unique identifiers.

UNION

Advantages:
- Flexibility: allows you to first merge unique identifiers and then perform additional joins.
- Can be more performant if the tables contain a lot of data but only unique values need to be processed.

Disadvantages:
- More complex query structure.
- Requires extra steps for merging and filtering data.

Important Notes

The choice of approach depends on the task and the size of the data. For small tables, the performance difference may be negligible.
If your database supports query optimizations such as indexing, this can significantly impact execution speed.
When working with NULL, remember that NULL = NULL always evaluates to FALSE. Use functions like COALESCE or IS NULL for proper handling.