Saturday, January 12, 2013

SQL Training - Part 1: Intro and Basic SELECT Statements

A typical table
SQL is the standard language for extracting and manipulating information from/in databases. While it comes in many varieties generally specific to the database in question, there is a significant shared portion and the differences tend to be more nuanced/less used functions.

The most common statement you'll use is the SELECT statement, which (surprise, surprise) selects data from a table or tables. The basic format of the select statement is

SELECT
     [columns]
FROM [table];


Where [columns] is a comma separated list of which columns you want to pull. For the sake of example, I'm going to be using a hypothetical set of tables that Blizzard might user for their games, since that will make the things we're talking about as common as possible to the typical audience of this blog. The first table we'll talk about will be logins, which contains data about which users logged into which Blizzard games at what time. So it might have 3 columns

date

| game_id

| user_id

1/1/2013| 1| 1234567
1/5/2013| 3| 3456789
1/7/2013| 2| 7636857

This table has three columns, recording who (user_id), logged into which game (game_id) on which date (date). To select all the data in this table, you would code

SELECT
     date,
     game_id,
     user_id
FROM logins;


But you'll probably never need to select all the data in a table. You might only be concerned with all users who logged into WoW (let's call that game_id=1) on 1/1/2013. To do this, you would code

SELECT
     user_id
FROM logins
WHERE
     game_id=1 AND date='1/1/2013';


I omitted the date and game_id columns since we already know what they are based on how we selected the data. They could be left in (and in some cases should).

I lied about how I would set up the logins table. I would instead set up the columns as such:
  • logins - tracks players logging into games
    • login_time - not just the date but also the time
    • game_id - id for the game
    • user_id - id for the user
    • acct_id - id for the acct they log into
By making login_time have the date and the time, we can better track when people log in and how many times a day they log in. By adding acct_id you can keep track of which account for that game they logged in to. That query would become

SELECT
     user_id
FROM logins
WHERE
     game_id=1 AND login_time::date='1/1/2013';



Question: How would you select when and what games user 2435649 logged into during November?
Answer: here

If you have any questions, feel free to leave them in the comments or to email me. My email can be found on my About page.

NOTES

Note: login_time::date takes the timestamp from the login_time field and removes the time portion, giving you just the date so that the database can accurately tell if it's equal to '1/1/2013'. The way this is done will vary from database to database, but a similar solution should exist for all. Not all databases will necessarily read '1/1/2013' and accept it as a date and may have you do it differently, but for the sake of this, and all further examples, I'll keep doing things this way.

Note: If you're going to select all of the columns in a table you can use * instead of typing all of the column's names. To select all logins for a particular user (1234567) in December you would code

SELECT
     *
FROM logins
WHERE
     user_id=1234567 

     AND login_time::date BETWEEN '12/1/2012' AND '12/31/2012';

Note: The way I've done the spacing and decided where new lines go isn't mandatory and I just do it for easy reading.