A collection of SQL problems
I published a collection of exercises on Github. The process for installing Postgres on OSX is described. Once you have cloned this repository and have PostgreSQL installed, loading the data into the databases is only a single terminal command.
There are (currently) 5 different databases to load; I will probably add to this collection over time. The format of the repo is that you are asked a question. An example from the Seattle Weather database asks:
What are the 10 hottest days on record? Take hottest to mean 'highest maximum temperature'.
The result of the query is provided:
date_weather | inches_rain | temp_max | temp_min | did_rain |
---|---|---|---|---|
2009-07-29 00:00:00 | 0 | 103 | 71 | f |
1994-07-20 00:00:00 | 0 | 100 | 65 | f |
1981-08-09 00:00:00 | 0 | 99 | 68 | f |
1991-07-23 00:00:00 | 0 | 99 | 65 | f |
1960-08-09 00:00:00 | 0 | 99 | 59 | f |
1981-08-10 00:00:00 | 0 | 98 | 67 | f |
1960-08-08 00:00:00 | 0 | 98 | 66 | f |
1988-09-02 00:00:00 | 0 | 98 | 59 | f |
1979-07-16 00:00:00 | 0 | 98 | 63 | f |
1967-08-16 00:00:00 | 0 | 98 | 59 | f |
What isn't provided is the query to get there. You should experiment with the queries, and try to get a result that matches mine. (If you think my posted results are incorrect, please leave a Github Issue!) The choice to not publish the queries is a deliberate choice!
While this is setup to work with PostgreSQL out of the box, the answers to the queries should be the same with any database you use. You are free to use MySQL, Oracle, or any other database you want. You will just need to load the data CSVs into your database engine of choice.
Will it work with SQLite3?
Yes, but please use a "real" database. SQLite3 is popular because it is extremely lightweight, and can be used on mobile devices, however it has strange behavior when used with aggregations that will be hard to "unlearn" when you use other databases.
PostgreSQL and MySQL are free -- give them a go instead!
But wait ... interviews are also about doing analysis and explaining conclusions
Knowing how to turn a precise question, such as "how many users logged onto our website in January", into an SQL query and getting results is a necessary skill for a data analyst.
A more important skill is being able to take an ill-posed question, such as "why is our traffic down?", and breaking it into a series of precise questions that you can then get from the database to find a solution. This is the realm of the case study.
At the moment, my exercises only help you study the syntax of SQL. They don't help you with case studies. I am brainstorming ways of fixing this, that don't involve too much work. It is a non-trivial problem, as often times case studies require making assumptions about what you think are important metrics, and then checking those assumptions with project managers.
Here are the two best resources I know of for looking at case studies:
Mode Analytics has 4 case studies on their community website. They allow you to query the data and make new charts in a Tableau-style interface. The case studies also have outlines of a possible answer that you can look at.
This is a series of books used to prepare consultants for interviews in their fields. You won't get any SQL experience from reading these books, but you will get a good sense of how a consultant break a problem from a domain they are not an expert in to a series of precise decisions they can query the data. The interviews are given as a back-and-forth between the interviewer and interviewee, which can help you get past asking clarifying questions about the problems, metrics, and data in the interview.
Please let me know if there are other resources out there that I have missed!