Avatar

Hej, I'm Julia.

Filtering SQL Left Outer Join Queries

#databases #sql

5 min read

I was defining an API endpoint recently, and naively implemented a solution that resulted in a N+1 problem. In a nutshell, what this meant was that for every row in the database table I was returning, I was making an additional query to grab some data I needed from another table. This is perhaps fine for small tables, but gets worse linearly with each additional row in your table. This is not scalable and I wanted to find a way to get all the data I needed with just one request.

It's easier to communicate this with an example, so let's play with an example. We've got a game table of computer games, and a separate join table that I've called game_favourite. The sole purpose of game_favourite is to log which users have favourited which games. game_favourite is unique on both game_id and user_id i.e. a user can only favourite a game once and there are no duplicate rows.

When I had initially implemented my suboptimal solution, what I was effectively doing was returning the game table, and then for each row, going to the game_favourite table to check whether a specific user (whose user_id I passed into the function) had favourited that game. So the N in N+1 comes from the number of rows in my game table (in this case 4), and the 1 comes from just doing the base query to return the game table as a whole in the first place.

What I want to do instead is to reduce this to just the 1 call, for a particular user_id. So, here's what the tables look like and the desired output I want:

// Starting point for **game** table

| id  | name               | favourite_count |
| --- | ------------------ | --------------- |
| 1   | ratchet and clank  | 2               |
| 2   | assassins creed    | 1               |
| 3   | far cry 5          | 1               |
| 4   | horizon: zero dawn | 0               |

// Starting point for **game_favourite** table

| id  | game_id | user_id |
| --- | ------- | ------- |
| 1   | 1       | 1       |
| 2   | 1       | 2       |
| 3   | 2       | 1       |
| 4   | 3       | 3       |

// Desired output for **user_id = 1** using the **game** table as a base

| name               | favourite_count | user_has_favourited |
| ------------------ | --------------- | ------------------- |
| ratchet and clank  | 2               | true                |
| assassins creed    | 1               | true                |
| far cry 5          | 1               | null                |
| horizon: zero dawn | 0               | null                |

The way I write SQL queries is to build them up, block by block, and envisage what the table looks like at each step. I also tend to translate my literal logical thought process into SQL to see how it plays out. Once I get things working, I then see how I might simplify the query. (I look at it as a warm up process, as SQL is not something I use frequently and it takes me a while to get back in the flow of things. 😂)

So here's a query that gets my desired outcome.

SELECT game.name, game.favourite_count, favourites.user_has_favourited
FROM game as game
LEFT OUTER JOIN (
    SELECT game_id, user_id,
         (CASE
             WHEN user_id = '1' THEN TRUE
             ELSE FALSE
          END) AS user_has_favourited
    FROM game_favourite
) AS favourites
ON game.id = favourites.game_id AND user_has_favourited = TRUE;

Here's a simplified query that does the same thing.

SELECT game.name, game.favourite_count, favourites.user_id IS NOT NULL as user_has_favourited
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';

Let's break this down.

  1. The starting point is to return the game table with the name and favourite_count columns.
SELECT game.name, game.favourite_count
FROM game AS game
  1. We also need the user_id data from the game_favourite table, so we need to do a LEFT OUTER JOIN. This is the default LEFT JOIN in SQL, so you can omit the OUTER keyword. The joins criteria (i.e. the common field) is the game_id.
SELECT game.name, game.favourite_count
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id

The resulting table here will be something like this - notice that Ratchet and Clank is repeated because 2 users have added this game as a favourite.

| name               | favourite_count |
| ------------------ | --------------- |
| ratchet and clank  | 2               |
| assassins creed    | 1               |
| far cry 5          | 1               |
| horizon: zero dawn | 0               |
| ratchet and clank  | 2               |
  1. We therefore also need to filter by the user_id that we're interested in, in this case, user_id = 1.
SELECT game.name, game.favourite_count
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';
  1. Finally, we want to display an additional column in our table called user_has_favourited, which shows true if user 1 has favourited the game.
SELECT game.name, game.favourite_count, favourites.user_id IS NOT NULL as user_has_favourited
FROM game AS game
LEFT JOIN game_favourite AS favourites
ON game.id = favourites.game_id AND favourites.user_id = '1';

© 2016-2024 Julia Tan · Powered by Next JS.