Using SQL to Flatten a Database Table
4 min read
I consider myself to be more of a frontend, rather than backend engineer, and tasks like having to query databases beyond just using standard ActiveRecord methods always seemed a little bewildering to me. Still, my goal is to be a well-rounded engineer, and to be as self-sufficient as I can be when faced with a problem, so I relish any opportunities that enable me to dabble in a bit of backend work.
I was working on a task involving the extraction of some data from our cloud databases in order to display them on charts. Armed up until this point, with only some basic knowledge on the concept of SQL JOINs to combine data from different tables, I set about trying learn the SQL commands I would need to get the job done.
The very first thing I needed to do was to "flatten" a single database table. I figured the best way to do this was to INNER JOIN
the table on itself, multiple times. I've mocked up an example database schema below to try to convey what I was dealing with.
// Starting point for tasks table
| user_id | task | status |
| ------- | -------- | ------ |
| 1 | eat | true |
| 1 | sleep | true |
| 1 | exercise | false |
| 2 | eat | false |
| 2 | sleep | true |
| 2 | exercise | true |
// Desired output
| user_id | eaten | slept | exercised |
| ------- | ----- | ----- | --------- |
| 1 | true | true | false |
| 2 | false | true | true |
The SQL query ended up looking something like this:
SELECT tasks.user_id, tasks.status AS eaten, slept.status AS slept, exercised.status AS exercised
FROM datawarehouse.tasks tasks
INNER JOIN datawarehouse.tasks AS slept ON slept.user_id = tasks.user_id
INNER JOIN datawarehouse.tasks AS exercised ON exercised.user_id = tasks.user_id
WHERE tasks.task = 'eat'
AND slept.task = 'sleep'
AND exercised.task = 'exercise'
Here's how I broke down the steps:
- Start with the
FROM
line to call the database table you need. My table reference isdatabase.tasks
in this case, and I've explicitly named ittasks
as well, so I can clearly refer to it, should I need to do so in the rest of my query. - I then added a filter where I only wanted to return rows where the
task
column equals'eat'
. This is done with the firstWHERE
line. - I then did an
INNER JOIN
to the same table, but explicitly named this second tableslept
. I did the join based onuser_id
. - At this point, the filter I wanted on the second table was to return rows where the
task
column equals'sleep'
. This is done by adding on to theWHERE
line withAND
and the filterslept.task = 'sleep'
. - Then comes the second
INNER JOIN
to the same table once again. I called this third tableexercised
and again combined it based onuser_id
. - The filter for this third table was for
exercised.task = 'exercise'
. - Now that my flattened table was set up, I added the final line (ironically the first line) where I chose (using
SELECT
) the columns I wanted to display from each of the 3 tables.
It all seems pretty abstract if you've never had to write SQL before as it's not entirely clear what each command is doing under the hood. What really helped me was actually getting visual feedback on what the resulting table looked like, with each additional change I was making to my query, as I wrote it. Logging into psql
from command line to view the tables can definitely help with this, or perhaps use your IDE if it allows for database connections.
I'll be writing another SQL post shortly as I moved onto writing SQL window functions for a later task. I started off dreading SQL but actually found it kind of fun once I was able to picture in my mind, what the commands were actually doing. 🕵🏻