Filtering SQL Left Outer Join Queries
5 min read
I was defining an API endpoint recently, and naively implemented a solution that resulted in a N+1 problem. In a nutshell, what this meant was that for every row in the database table I was returning, I was making an additional query to grab some data I needed from another table. This is perhaps fine for small tables, but gets worse linearly with each additional row in your table. This is not scalable and I wanted to find a way to get all the data I needed with just one request.
It's easier to communicate this with an example, so let's play with an example. We've got a game
table of computer games, and a separate join table that I've called game_favourite
. The sole purpose of game_favourite
is to log which users have favourited which games. game_favourite
is unique on both game_id
and user_id
i.e. a user can only favourite a game once and there are no duplicate rows.
When I had initially implemented my suboptimal solution, what I was effectively doing was returning the game
table, and then for each row, going to the game_favourite
table to check whether a specific user (whose user_id
I passed into the function) had favourited that game. So the N
in N+1
comes from the number of rows in my game table (in this case 4), and the 1
comes from just doing the base query to return the game table as a whole in the first place.
What I want to do instead is to reduce this to just the 1 call, for a particular user_id
. So, here's what the tables look like and the desired output I want:
// Starting point for **game** table
| id | name | favourite_count |
| --- | ------------------ | --------------- |
| 1 | ratchet and clank | 2 |
| 2 | assassins creed | 1 |
| 3 | far cry 5 | 1 |
| 4 | horizon: zero dawn | 0 |
// Starting point for **game_favourite** table
| id | game_id | user_id |
| --- | ------- | ------- |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 3 |
// Desired output for **user_id = 1** using the **game** table as a base
| name | favourite_count | user_has_favourited |
| ------------------ | --------------- | ------------------- |
| ratchet and clank | 2 | true |
| assassins creed | 1 | true |
| far cry 5 | 1 | null |
| horizon: zero dawn | 0 | null |
The way I write SQL queries is to build them up, block by block, and envisage what the table looks like at each step. I also tend to translate my literal logical thought process into SQL to see how it plays out. Once I get things working, I then see how I might simplify the query. (I look at it as a warm up process, as SQL is not something I use frequently and it takes me a while to get back in the flow of things. 😂)
So here's a query that gets my desired outcome.
SELECT game.name, game.favourite_count, favourites.user_has_favourited
FROM game as game
LEFT OUTER JOIN (
SELECT game_id, user_id,
(CASE
WHEN user_id = '1' THEN TRUE
ELSE FALSE
END) AS user_has_favourited
FROM game_favourite
) AS favourites
ON game.id = favourites.game_id AND user_has_favourited = TRUE;
Here's a simplified query that does the same thing.
SELECT game.name, game.favourite_count, favourites.user_id IS NOT NULL as user_has_favourited
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';
Let's break this down.
- The starting point is to return the game table with the
name
andfavourite_count
columns.
SELECT game.name, game.favourite_count
FROM game AS game
- We also need the
user_id
data from thegame_favourite
table, so we need to do a LEFT OUTER JOIN. This is the default LEFT JOIN in SQL, so you can omit the OUTER keyword. The joins criteria (i.e. the common field) is thegame_id
.
SELECT game.name, game.favourite_count
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id
The resulting table here will be something like this - notice that Ratchet and Clank is repeated because 2 users have added this game as a favourite.
| name | favourite_count |
| ------------------ | --------------- |
| ratchet and clank | 2 |
| assassins creed | 1 |
| far cry 5 | 1 |
| horizon: zero dawn | 0 |
| ratchet and clank | 2 |
- We therefore also need to filter by the
user_id
that we're interested in, in this case,user_id = 1
.
SELECT game.name, game.favourite_count
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';
- Finally, we want to display an additional column in our table called
user_has_favourited
, which showstrue
if user 1 has favourited the game.
SELECT game.name, game.favourite_count, favourites.user_id IS NOT NULL as user_has_favourited
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';