Learning about compilers made me a more confident programmer - a part of my toolchain that had felt like magic was revealed to be merely a complex (and, sometimes, beautiful) collection of mechanisms, and I gained a much greater understanding of how the code I typed was put into practice by the computer. It occurred to me recently that though I've been using relational databases for years (I first learned about normal forms in 2000ish, from a book called MS Access Unlocked or something similarly unimpressive), I don't actually have much idea of how they work. You enter a SQL query and it gets parsed as normal and then, er... something something query analyser... something B+-tree index query plan AND AS IF BY MAGIC it becomes a nice fast bit of looping and pointer arithmetic.
Clearly this was not good enough. So I asked on Twitter for reading recommendations. Here's what I got.
It'll clearly take me a while to get through all that! My plan is to read through Viglas' notes, have a go at some of the exercises from his course (which also appear to be online), and then take a look at either Meier's book or AHV. Anything else I should be looking at? Does my strategy sound reasonable?
Clearly this was not good enough. So I asked on Twitter for reading recommendations. Here's what I got.
- David Meier, The Theory of Relational Databases, 1983 (PDFs). As the name suggests, this looks heavy on the relational algebra and light on implementation.
- C. J. Date, An Introduction to Database Systems (Amazon link to paper book), 2003. Apparently this "contains a lot about internals. The writing style is quite verbose though."
- Abiteboul, Hull and Vianu, Foundations of Databases (PDFs). This appears to cover SQL and relational algebra in the first half of the book, and Datalog in the second. Which sounds very interesting, but not quite what I was looking for.
- Stratis Viglas, Advanced Databases (PDF). Slides from a 2015 undergraduate course given at the University of Edinburgh. Covers topics like on-disk layout, external sorting, query optimization, transaction processing, B+-trees and hash joins - the stuff I was after, in other words.
- Raghu Ramakrishnan and Johannes Gehrke, Database Management Systems, 2002 (Amazon link, though the first Google hit is a presumably-illegal PDF of the full text!) The course textbook for Viglas' course, this appears to cover relational algebra, practical SQL programming, the DB implementation stuff in the course, and quite a lot more.
- Julia Evans (aka b0rk) wrote a nice sequence of blog posts in which she delves into SQLite internals; I should re-read these.
- The SQLite documentation looks pretty good, and includes some information about internals.
It'll clearly take me a while to get through all that! My plan is to read through Viglas' notes, have a go at some of the exercises from his course (which also appear to be online), and then take a look at either Meier's book or AHV. Anything else I should be looking at? Does my strategy sound reasonable?
Tags: