Avatar

Hello, I'm Julia.

Using SQL to Flatten a Database Table

#databases #sql

4 min read

I consider myself to be more of a frontend, rather than backend engineer, and tasks like having to query databases beyond just using standard ActiveRecord methods always seemed a little bewildering to me. Still, my goal is to be a well-rounded engineer, and to be as self-sufficient as I can be when faced with a problem, so I relish any opportunities that enable me to dabble in a bit of backend work.

I was working on a task involving the extraction of some data from our cloud databases in order to display them on charts. Armed up until this point, with only some basic knowledge on the concept of SQL JOINs to combine data from different tables, I set about trying learn the SQL commands I would need to get the job done.

The very first thing I needed to do was to "flatten" a single database table. I figured the best way to do this was to INNER JOIN the table on itself, multiple times. I've mocked up an example database schema below to try to convey what I was dealing with.

// Starting point for tasks table

| user_id | task     | status |
| ------- | -------- | ------ |
| 1       | eat      | true   |
| 1       | sleep    | true   |
| 1       | exercise | false  |
| 2       | eat      | false  |
| 2       | sleep    | true   |
| 2       | exercise | true   |

// Desired output

| user_id | eaten | slept | exercised |
| ------- | ----- | ----- | --------- |
| 1       | true  | true  | false     |
| 2       | false | true  | true      |

The SQL query ended up looking something like this:

SELECT tasks.user_id, tasks.status AS eaten, slept.status AS slept, exercised.status AS exercised

FROM datawarehouse.tasks tasks
INNER JOIN datawarehouse.tasks AS slept ON slept.user_id = tasks.user_id
INNER JOIN datawarehouse.tasks AS exercised ON exercised.user_id = tasks.user_id

WHERE tasks.task = 'eat'
AND slept.task = 'sleep'
AND exercised.task = 'exercise'

Here's how I broke down the steps:

  1. Start with the FROM line to call the database table you need. My table reference is database.tasks in this case, and I've explicitly named it tasks as well, so I can clearly refer to it, should I need to do so in the rest of my query.
  2. I then added a filter where I only wanted to return rows where the task column equals 'eat'. This is done with the first WHERE line.
  3. I then did an INNER JOIN to the same table, but explicitly named this second table slept . I did the join based on user_id.
  4. At this point, the filter I wanted on the second table was to return rows where the task column equals 'sleep'. This is done by adding on to the WHERE line with AND and the filter slept.task = 'sleep'.
  5. Then comes the second INNER JOIN to the same table once again. I called this third table exercised and again combined it based on user_id.
  6. The filter for this third table was for exercised.task = 'exercise'.
  7. Now that my flattened table was set up, I added the final line (ironically the first line) where I chose (using SELECT) the columns I wanted to display from each of the 3 tables.

It all seems pretty abstract if you've never had to write SQL before as it's not entirely clear what each command is doing under the hood. What really helped me was actually getting visual feedback on what the resulting table looked like, with each additional change I was making to my query, as I wrote it. Logging into psql from command line to view the tables can definitely help with this, or perhaps use your IDE if it allows for database connections.

I'll be writing another SQL post shortly as I moved onto writing SQL window functions for a later task. I started off dreading SQL but actually found it kind of fun once I was able to picture in my mind, what the commands were actually doing. 🕵🏻

© 2016-2024 Julia Tan · Powered by Next JS.