Playing Games with SQL

2011 Oct 19

When dealing with a sports league of any sort, there are two elemental tables that make the core of the application – games and teams. The game table has foreign keys in it that refer to the team table. Because of this, we’ll talk about the team table first.

Team Table

+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| id         | varchar(3)  | NO   | PRI | NULL    |       |
| name       | varchar(20) | NO   |     | NULL    |       |
| fullname   | varchar(50) | NO   |     | NULL    |       |
| conference | char(3)     | NO   |     | NULL    |       |
| division   | varchar(5)  | NO   |     | NULL    |       |
| website    | varchar(80) | NO   |     | NULL    |       |
+------------+-------------+------+-----+---------+-------+

Above is the structure of the team table in MySQL. The structure of this table is flexible enough to be used for most sports. Just for the sake of familiarity, lets use the NFL as an example. Naturally, your data will be different. The NFL has 32 teams. Each team can be identified by a 2 or 3 letter code. It’s a convention used throughout the NFL  by management, announcers, and fans alike.

id name fullname conference division website
DET Lions Detroit Lions NFC North http://detroitlions.com
KC Chiefs Kansas City Chiefs AFC West http://kcchiefs.com
PIT Steelers Pittsburgh Steelers AFC North http://steelers.com
SEA Seahawks Seattle Seahawks NFC West http://seahawks.com

The id field is a PRIMARY KEY. As you know, the function of a primary key is to uniquely identify a record of data. Since we’re only dealing with the NFL in this post, i can safely use the 2-or-3 letter codes. If I wanted to include other sports like baseball, then we’d have a problem. If we added all the MLB teams, we’d  have lots of clashes. SEA could mean the Seahawks or the Mariners. That’s bad. The solution to that problem is use auto incrementing integers. It would look as follows in MySQL.

+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| id         | int(11)     | NO   | PRI | NULL    | AUTO  |
| abbr       | varchar(3)  | NO   |     | NULL    |       |
| name       | varchar(20) | NO   |     | NULL    |       |
| fullname   | varchar(50) | NO   |     | NULL    |       |
| conference | char(3)     | NO   |     | NULL    |       |
| division   | varchar(5)  | NO   |     | NULL    |       |
| website    | varchar(80) | NO   |     | NULL    |       |
+------------+-------------+------+-----+---------+-------+

Since we’re not using any other information outside of the NFL, were ok with just using the team code as a primary key. It wouldn’t hurt us to use the integers. In fact we’d probably get better performance boost, since integers index better and provide faster look ups than strings, particularly as the data set grows large. However, we’re sticking with the codes. you’ll see why when we talk about the game table.

Game Table

+------------+------------+------+-----+---------+-------+
| Field      | Type       | Null | Key | Default | Extra |
+------------+------------+------+-----+---------+-------+
| id         | int(11)    | NO   | PRI | NULL    |       |
| week       | int(11)    | YES  |     | NULL    |       |
| away       | varchar(3) | NO   |     | NULL    |       |
| home       | varchar(3) | NO   |     | NULL    |       |
| date_time  | datetime   | NO   |     | NULL    |       |
| away_score | int(11)    | YES  |     | NULL    |       |
| home_score | int(11)    | YES  |     | NULL    |       |
| winner     | varchar(3) | YES  |     | NULL    |       |
+------------+------------+------+-----+---------+-------+

[code lang="sql"]
SELECT game.id, game.away, game.home, game.date_time
FROM game
WHERE week = 1;
[/code]

The query above provides a simple display of which teams are playing during a given week. In this example, it’s looking at the first week of the season. There are 2 teams displayed for each game of the week. If this were a past week, where the game has been played, there’d be a third field – game.winner.

In the case of the NFL, and even MLB, it’s common for people to refer to teams by their codes. Because it’s common knowledge, they make great identifiers, and team names don’t need to be looked up. This saves us from having to do joins to the team table, which helps with performance.

If you needed to do a look up for the team information, the game table has some trickiness to it because there are 3 representations of team information, and only 1 team table. Why is that tricky? Because when you write your SELECT statement to do your standard equi-join between the game and team tables, the field that you use to join on will determine which record from the team table gets selected. There are different solutions you might be tempted to use to solve this, all of which are bad:

  • Hard code a copy of the Team in as an array of arrays in your code
  • Do team lookup in your application code
  • Create 2 additional physical copies of the team table in MySQL

The first option is the worst. When it comes to writing good software, one of the worst things you can do is to write duplicate code and data. this violates the DRY principle. It also creates more work for you in the long run, because you need to manage things in multiple places and make sure they’re in sync with each other. If someone updates the database table, you as the programmer have no way to tell what changed.

The second option is better. It’s much easier to maintain. However, doing a lookup every time is gonna cause performance issues as more and more people use your application.

The third option duplicates data  by creating 2 physical copies of the team table. That is bad, but not as bad as the first option. At least its all in the database. ironically it what we want to do, but not how we want to do it. It’s the making of physical copies that’s bad. What we want to do is make copies in memory. How do we do that? Table aliases!

Table Aliases

Table aliases are part of the functionality of SQL. They let you refer to a single table using multiple names. This allows you to do multiple joins. The aliases only exist for this query, and won’t affect any future queries.
[code lang="sql" title="An example using TABLE ALIAS"]
SELECT game.id, game.away, away.fullname,
game.home, home.fullname,
game.date_time, winner.fullname
FROM game
JOIN team away ON game.away = away.id
JOIN team home ON game.home = home.id
JOIN team winner ON game.winner = winner.id
WHERE week = 1;
[/code]

This query shows the games for a week and who won.

In the first JOIN clause, an alias named “away” is mapped to the team table, and is used in the subsequent ON clause. The subsequent JOIN clauses map connect with the team references to complete the query. The generalized form of a join is as follows

JOIN table [alias] ON condition

The ‘table’ is the name of the table you want to create a join with. the [alias] is optional. Most of the time, you don’t need it – But it’s useful is cases such as this. The value of ‘condition’ is the type type of join you want to perform. For our purposes, we use an equi-join or inner join.

You can look at my SQL code on GitHub.

And The Winner Is?

With each game, someone should win. At least I think so. Not every professional sport requires it though. There are two ways that you can declare a winner. The easy way is just to have a winner field and put the correct team id in the field. It’s easier to update, but there are some downsides. These are addressed by the second way. Keep track of the scores for each game and compare them to determine the winner on the fly. The good thing about this, is that it can always be derived by looking at the score. An additional benefit is that later you can get stats on the point spreads . When a team wins, do they win big? or just squeak by? What’s the lowest score a team earned and still won? What’s the largest number of points a team ever scored and still lost? You can’t get those answers if you only keep track of the ID of the team without the scores.

[code lang="sql" title="Display the week's games with winners - game_winners.sql"]
SELECT id, away, away_score, home, home_score,
IF(away_score = home_score, '--tied--',
IF(away_score > home_score, away, home) ) AS winner
FROM game
WHERE week = 1;
[/code]

Sorry, but comments are closed. I hope you enjoyed the article