On Monday I went to the first day of Conor McBride's course Introduction to Dependently-Typed programming in Agda. "What's dependently-typed programming?" you ask. Well, when a compiler type-checks your program, it's actually (in a sense which can be made precise) proving theorems about your code. Assignments of types to expressions correspond to proofs of statements in a formal logical language; the precise logical language in which these statements are expressed is determined by your type system (union types correspond to "or", function types correspond to "if... then", that kind of thing). This correspondence goes by the fancy name of "the Curry-Howard isomorphism" in functional programming and type-theory circles. In traditional statically-typed languages these theorems are mostly pretty uninteresting, but by extending your type system so it corresponds to a richer logical language you can start to state and prove some more interesting theorems by expressing them as types, guaranteeing deep properties of your program statically. This is the idea behind dependent typing. A nice corollary of their approach is that types in dependently-typed languages (such as Agda, the language of the course) can be parametrised by values (and not just by other types, as in Haskell), so you can play many of the same type-level metaprogramming games as in C++ and Ada, but in a hopefully less crack-fuelled way. I spent a bit of time last year playing around with Edwin Brady's dependently-typed systems language Idris, but found the dependent-typing paradigm hard to wrap my head around. So I was very pleased when Conor's course was announced.
The course is 50% lab-based, and in these lab sessions I realised something important: fancy type-system or no, I need to test my code, particularly when I'm working in such an unfamiliar language. Dependent typing may be all about correctness by construction, but I'm not (yet?) enlightened enough to work that way - I need to see what results my code actually produces. I asked Conor if there were any way to evaluate individual Agda expressions, and he pointed me at the "Evaluate term to normal form" command in Emacs' Agda mode (which revealed that I had, indeed, managed to produce several incorrect but well-typed programs). Now, that's better than nothing, but it's clearly inadequate as a testing system - you have to type the expression to evaluate every time, and check the result by eye. I asked Conor if Agda had a more extensive unit-testing framework, and he replied "I'm not aware of such a thing. The culture is more 'correctness by construction'. Testing is still valuable."
So I wrote one.
I've written - or at least hacked on - a test framework of some sort at almost every programming job I've had (though these days I try to lean on Test::More and friends as much as possible). It gets hella tedious. This one was a bit different, though. One message that came out of Conor's lectures was that the correct way to represent equality in dependently-typed languages is somewhat controversial; as a beginner, I didn't want to dip a toe into these dangerous waters until I had a clearer idea of the issues. But the basic operation of any testing system is "running THIS should yield THAT". Fortunately, there was a way I could punt on the problem. Since Agda development seems to be closely tied to the interactive Emacs mode, I could deliver my system as a set of Emacs commands; the actual testing could be done by normalising the expression under consideration and testing the normalised form for string-equality with the expected answer.
This was less easy than I'd expected; it turns out that agda-mode commands work by sending commands to a slave GHCi process, which generates elisp to insert the results into the appropriate buffer. I'm sure that agda-mode's authors had some rationale for this rather bizarre design, but it makes agda-mode a lot harder to build on than it could be. However (in close collaboration with Aaron Crane, who both contributed code directly and guided me through the gargantuan Emacs API like a Virgil to my Dante) I eventually succeeded. There are two ways to get our code:
You should recognise the output as the Test Anything Protocol; it should be easy to plug in existing tools for aggregating and displaying TAP results.
There are a lot of things I'd like to do in the future:
The course is 50% lab-based, and in these lab sessions I realised something important: fancy type-system or no, I need to test my code, particularly when I'm working in such an unfamiliar language. Dependent typing may be all about correctness by construction, but I'm not (yet?) enlightened enough to work that way - I need to see what results my code actually produces. I asked Conor if there were any way to evaluate individual Agda expressions, and he pointed me at the "Evaluate term to normal form" command in Emacs' Agda mode (which revealed that I had, indeed, managed to produce several incorrect but well-typed programs). Now, that's better than nothing, but it's clearly inadequate as a testing system - you have to type the expression to evaluate every time, and check the result by eye. I asked Conor if Agda had a more extensive unit-testing framework, and he replied "I'm not aware of such a thing. The culture is more 'correctness by construction'. Testing is still valuable."
So I wrote one.
I've written - or at least hacked on - a test framework of some sort at almost every programming job I've had (though these days I try to lean on Test::More and friends as much as possible). It gets hella tedious. This one was a bit different, though. One message that came out of Conor's lectures was that the correct way to represent equality in dependently-typed languages is somewhat controversial; as a beginner, I didn't want to dip a toe into these dangerous waters until I had a clearer idea of the issues. But the basic operation of any testing system is "running THIS should yield THAT". Fortunately, there was a way I could punt on the problem. Since Agda development seems to be closely tied to the interactive Emacs mode, I could deliver my system as a set of Emacs commands; the actual testing could be done by normalising the expression under consideration and testing the normalised form for string-equality with the expected answer.
This was less easy than I'd expected; it turns out that agda-mode commands work by sending commands to a slave GHCi process, which generates elisp to insert the results into the appropriate buffer. I'm sure that agda-mode's authors had some rationale for this rather bizarre design, but it makes agda-mode a lot harder to build on than it could be. However (in close collaboration with Aaron Crane, who both contributed code directly and guided me through the gargantuan Emacs API like a Virgil to my Dante) I eventually succeeded. There are two ways to get our code:
ordarcs get http://patch-tag.com/r/pozorvlak/agda-test
Then load the filegit clone git://github.com/pozorvlak/agda-test.git
agda-test.el
in your .emacs
in the usual way. Having done so, you can add tests to your Agda file by adding comments of the form For instance,{- test TESTNAME: ACTUAL is EXPECTED; -}
When you then invoke the function{- test 2+1: (suc (suc zero)) +N (suc zero) is (suc (suc (suc zero))); test 3+0: (suc (suc (suc (zero)))) +N zero is (suc (suc zero)); -}
agda2-test-all
(via C-u C-c C-v
for "verify"), you should be presented with a new buffer called *Agda test results*
, containing the text [ Except, er, I get "expected _175" for that last test instead. I don't think that's a bug in my elisp code, because I get the same result when I evaluate that expression manually with C-c C-n. Halp pls?]1..2 ok 1 - 2+1 not ok 2 - 3+0 got 3 expected 2
You should recognise the output as the Test Anything Protocol; it should be easy to plug in existing tools for aggregating and displaying TAP results.
There are a lot of things I'd like to do in the future:
- Add commands to
only run a single test, or just the tests in a given comment block, or a user-specified ad-hoc test group. - Highlight failing tests in the Agda buffer.
- Allow the user to specify a TAP aggregator for the results.
- Do something about the case where the test expressions don't compile cleanly.
Tags: