Using SQL to Flatten a Database Table
4 min read
I consider myself to be more of a frontend, rather than backend engineer, and tasks like having to query databases beyond just using standard ActiveRecord methods always seemed a little bewildering to me. Still, my goal is to be a well-rounded engineer, and to be as self-sufficient as I can be when faced with a problem, so I relish any opportunities that enable me to dabble in a bit of backend work.
I was working on a task involving the extraction of some data from our cloud databases in order to display them on charts. Armed up until this point, with only some basic knowledge on the concept of SQL JOINs to combine data from different tables, I set about trying learn the SQL commands I would need to get the job done.
The very first thing I needed to do was to "flatten" a single database table. I figured the best way to do this was to
INNER JOIN the table on itself, multiple times. I've mocked up an example database schema below to try to convey what I was dealing with.
// Starting point for tasks table | user_id | task | status | | ------- | -------- | ------ | | 1 | eat | true | | 1 | sleep | true | | 1 | exercise | false | | 2 | eat | false | | 2 | sleep | true | | 2 | exercise | true | // Desired output | user_id | eaten | slept | exercised | | ------- | ----- | ----- | --------- | | 1 | true | true | false | | 2 | false | true | true |
The SQL query ended up looking something like this:
SELECT tasks.user_id, tasks.status AS eaten, slept.status AS slept, exercised.status AS exercised FROM datawarehouse.tasks tasks INNER JOIN datawarehouse.tasks AS slept ON slept.user_id = tasks.user_id INNER JOIN datawarehouse.tasks AS exercised ON exercised.user_id = tasks.user_id WHERE tasks.task = 'eat' AND slept.task = 'sleep' AND exercised.task = 'exercise'
Here's how I broke down the steps:
- Start with the
FROMline to call the database table you need. My table reference is
database.tasksin this case, and I've explicitly named it
tasksas well, so I can clearly refer to it, should I need to do so in the rest of my query.
- I then added a filter where I only wanted to return rows where the
'eat'. This is done with the first
- I then did an
INNER JOINto the same table, but explicitly named this second table
slept. I did the join based on
- At this point, the filter I wanted on the second table was to return rows where the
'sleep'. This is done by adding on to the
ANDand the filter
slept.task = 'sleep'.
- Then comes the second
INNER JOINto the same table once again. I called this third table
exercisedand again combined it based on
- The filter for this third table was for
exercised.task = 'exercise'.
- Now that my flattened table was set up, I added the final line (ironically the first line) where I chose (using
SELECT) the columns I wanted to display from each of the 3 tables.
It all seems pretty abstract if you've never had to write SQL before as it's not entirely clear what each command is doing under the hood. What really helped me was actually getting visual feedback on what the resulting table looked like, with each additional change I was making to my query, as I wrote it. Logging into
psql from command line to view the tables can definitely help with this, or perhaps use your IDE if it allows for database connections.
I'll be writing another SQL post shortly as I moved onto writing SQL window functions for a later task. I started off dreading SQL but actually found it kind of fun once I was able to picture in my mind, what the commands were actually doing. 🕵🏻