Filtering SQL Left Outer Join Queries
5 min read
I was defining an API endpoint recently, and naively implemented a solution that resulted in a N+1 problem. In a nutshell, what this meant was that for every row in the database table I was returning, I was making an additional query to grab some data I needed from another table. This is perhaps fine for small tables, but gets worse linearly with each additional row in your table. This is not scalable and I wanted to find a way to get all the data I needed with just one request.
It's easier to communicate this with an example, so let's play with an example. We've got a game table of computer games, and a separate join table that I've called game_favourite. The sole purpose of game_favourite is to log which users have favourited which games. game_favourite is unique on both game_id and user_id i.e. a user can only favourite a game once and there are no duplicate rows.
When I had initially implemented my suboptimal solution, what I was effectively doing was returning the game table, and then for each row, going to the game_favourite table to check whether a specific user (whose user_id I passed into the function) had favourited that game. So the N in N+1 comes from the number of rows in my game table (in this case 4), and the 1 comes from just doing the base query to return the game table as a whole in the first place.
What I want to do instead is to reduce this to just the 1 call, for a particular user_id. So, here's what the tables look like and the desired output I want:
// Starting point for **game** table
| id | name | favourite_count |
| --- | ------------------ | --------------- |
| 1 | ratchet and clank | 2 |
| 2 | assassins creed | 1 |
| 3 | far cry 5 | 1 |
| 4 | horizon: zero dawn | 0 |
// Starting point for **game_favourite** table
| id | game_id | user_id |
| --- | ------- | ------- |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 3 |
// Desired output for **user_id = 1** using the **game** table as a base
| name | favourite_count | user_has_favourited |
| ------------------ | --------------- | ------------------- |
| ratchet and clank | 2 | true |
| assassins creed | 1 | true |
| far cry 5 | 1 | null |
| horizon: zero dawn | 0 | null |The way I write SQL queries is to build them up, block by block, and envisage what the table looks like at each step. I also tend to translate my literal logical thought process into SQL to see how it plays out. Once I get things working, I then see how I might simplify the query. (I look at it as a warm up process, as SQL is not something I use frequently and it takes me a while to get back in the flow of things. 😂)
So here's a query that gets my desired outcome.
SELECT game.name, game.favourite_count, favourites.user_has_favourited
FROM game as game
LEFT OUTER JOIN (
SELECT game_id, user_id,
(CASE
WHEN user_id = '1' THEN TRUE
ELSE FALSE
END) AS user_has_favourited
FROM game_favourite
) AS favourites
ON game.id = favourites.game_id AND user_has_favourited = TRUE;Here's a simplified query that does the same thing.
SELECT game.name, game.favourite_count, favourites.user_id IS NOT NULL as user_has_favourited
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';Let's break this down.
- The starting point is to return the game table with the
nameandfavourite_countcolumns.
SELECT game.name, game.favourite_count
FROM game AS game- We also need the
user_iddata from thegame_favouritetable, so we need to do a LEFT OUTER JOIN. This is the default LEFT JOIN in SQL, so you can omit the OUTER keyword. The joins criteria (i.e. the common field) is thegame_id.
SELECT game.name, game.favourite_count
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_idThe resulting table here will be something like this - notice that Ratchet and Clank is repeated because 2 users have added this game as a favourite.
| name | favourite_count |
| ------------------ | --------------- |
| ratchet and clank | 2 |
| assassins creed | 1 |
| far cry 5 | 1 |
| horizon: zero dawn | 0 |
| ratchet and clank | 2 |- We therefore also need to filter by the
user_idthat we're interested in, in this case,user_id = 1.
SELECT game.name, game.favourite_count
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';- Finally, we want to display an additional column in our table called
user_has_favourited, which showstrueif user 1 has favourited the game.
SELECT game.name, game.favourite_count, favourites.user_id IS NOT NULL as user_has_favourited
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';