January 2018

S M T W T F S
  123456
78910111213
14151617181920
21222324252627
28293031   

Style Credit

Expand Cut Tags

No cut tags
Monday, August 19th, 2013 09:10 pm
If I ever design a first course in compilers, I'll do it backwards: we'll start with code generation, move on to static analysis, then parse a token stream and finally lex source text. As we go along, students will be required to implement a small compiler for a simple language, using the techniques they've just learned. The (excellent) Stanford/Coursera compilers course does something similar, but they proceed in the opposite direction, following the dataflow in the final compiler: first they cover lexing, then parsing, then syntax analysis, then codegen. The first Edinburgh compilers course follows roughly the same plan of lectures, and I expect many other universities' courses do too.

I think a backwards course would work better for two reasons:
  1. Halfway through the Stanford course, you have a program that can convert source text into an intermediate representation with which you can't do very much. Halfway through the backwards course, you'd have a compiler for an unfriendly source language: you could write programs directly in whatever level of IR you'd got to (I'm assuming a sensible implementation language that doesn't make entering data literals too hard), and compile them using code you'd written to native code. I think that would be pretty motivating.
  2. When I did the Stanford course, all the really good learning experiences were in the back end. Writing a Yacc parser was a fiddly but largely straightforward experience; writing the code generator taught me loads about how your HLL code is actually executed and how runtime systems work. I also learned some less obvious things like the value of formal language specifications¹. Most CS students won't grow up to be compiler hackers, but they will find it invaluable to have a good idea of what production compilers do to their HLL code; it'll be much more useful than knowing about all the different types of parsing techniques, anyway². Students will drop out halfway through the course, and even those who make it all the way through will be tired and stressed by the end and will thus take in less than they did at the beginning: this argues for front-loading the important material.
What am I missing?

¹ I learned this the hard way. I largely ignored the formal part of the spec when writing my code generator, relying instead on the informal description; then towards the end of the allocated period I took a closer look at it and realised that it provided simple answers to all the thorny issues I'd been struggling with.
² The answer to "how should I write this parser?" in an industrial context is usually either "with a parser generator" or "recursive descent". LALR parsers such as those produced by Yacc are a pain to debug if you don't understand the underlying theory, true, but that's IMHO an argument for using a parser generator based on some other algorithm, most of which are more forgiving.
Thursday, November 14th, 2013 01:00 am (UTC)
I only just saw this, somehow, for some reason. And I would sign that petition. All the compiler texts and such I've read are structured the "standard" way you describe. My CS postgraduate course only covered the first chunk at all, leaving the interesting part as an extension module that you had to do schedule and administrative gymnastics to take. The result of which is that I know a bit about lexing and parsing (enough to get on with, anyway) and virtually nothing about the rest, and find it difficult motivation-wise to slog through the first half to get to the second.
Thursday, August 7th, 2014 03:17 pm (UTC)
I have been writing a small virtual machine in C and if I were to write a compiler, a course like this would help a lot.

See: http://github.com/tekknolagi/carp
Thursday, August 7th, 2014 07:13 pm (UTC)
This hit the front page of HN today:
https://news.ycombinator.com/item?id=8147568