pozorvlak | Entries tagged with agda

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

My post about unit-testing Agda code using elisp was posted to the dependent types subreddit (of which I was previously unaware). There, the commenter stevana pointed out that you could achieve the same effect using the following Agda code:

test1 : 2 + 1 == 3
test1 = refl

test2 : 3 + 0 == 3
test2 = refl

If your test fails, you'll get a compile-time error on that line. Aaaaaaaargh. I do wish someone had told me that before I spent a week mucking about with elisp. Well, that's not quite true - I learned a lot about Emacs, I had some fun writing the code, I learned some things about Agda that I might otherwise not have done, I got some code-review from

aaroncrane, and I had a chance to try out using darcs and patch-tag for a collaborative project (the experience, I'm afraid, didn't measure up well against git and GitHub).

What I'd failed to realise, of course, is that the == type constructor describes computable equality: a == b is inhabited if the compiler can compute a proof that a is equal to b. "They both normalise to the same thing" is such a proof, and obviously one that the compiler's smart enough to look for on its own.

Ordinarily, a testcase is better than a compile-time assertion with the same meaning, because you can attach a debugger to a failing test and investigate exactly how your code is broken. This doesn't mean that types are stupid - types draw their power from their ability to summarise many different testcases. But in this case, I don't think there's much reason to prefer our approach to stevena's: the slight increase in concision that agda-test allows is balanced out by its inability to coerce both expressions to the same type, which often means the user has to type a lot more code. Stevena's approach does not suffer from this problem.

This also makes me more positive about dependent typing: if simple testcases can be incorporated into your type-system so easily, then maybe the "types are composable tests" slogan has some truth to it.

But seriously, guys, macros.

On Monday I went to the first day of Conor McBride's course Introduction to Dependently-Typed programming in Agda. "What's dependently-typed programming?" you ask. Well, when a compiler type-checks your program, it's actually (in a sense which can be made precise) proving theorems about your code. Assignments of types to expressions correspond to proofs of statements in a formal logical language; the precise logical language in which these statements are expressed is determined by your type system (union types correspond to "or", function types correspond to "if... then", that kind of thing). This correspondence goes by the fancy name of "the Curry-Howard isomorphism" in functional programming and type-theory circles. In traditional statically-typed languages these theorems are mostly pretty uninteresting, but by extending your type system so it corresponds to a richer logical language you can start to state and prove some more interesting theorems by expressing them as types, guaranteeing deep properties of your program statically. This is the idea behind dependent typing. A nice corollary of their approach is that types in dependently-typed languages (such as Agda, the language of the course) can be parametrised by values (and not just by other types, as in Haskell), so you can play many of the same type-level metaprogramming games as in C++ and Ada, but in a hopefully less crack-fuelled way. I spent a bit of time last year playing around with Edwin Brady's dependently-typed systems language Idris, but found the dependent-typing paradigm hard to wrap my head around. So I was very pleased when Conor's course was announced.

The course is 50% lab-based, and in these lab sessions I realised something important: fancy type-system or no, I need to test my code, particularly when I'm working in such an unfamiliar language. Dependent typing may be all about correctness by construction, but I'm not (yet?) enlightened enough to work that way - I need to see what results my code actually produces. I asked Conor if there were any way to evaluate individual Agda expressions, and he pointed me at the "Evaluate term to normal form" command in Emacs' Agda mode (which revealed that I had, indeed, managed to produce several incorrect but well-typed programs). Now, that's better than nothing, but it's clearly inadequate as a testing system - you have to type the expression to evaluate every time, and check the result by eye. I asked Conor if Agda had a more extensive unit-testing framework, and he replied "I'm not aware of such a thing. The culture is more 'correctness by construction'. Testing is still valuable."

So I wrote one.

I've written - or at least hacked on - a test framework of some sort at almost every programming job I've had (though these days I try to lean on Test::More and friends as much as possible). It gets hella tedious. This one was a bit different, though. One message that came out of Conor's lectures was that the correct way to represent equality in dependently-typed languages is somewhat controversial; as a beginner, I didn't want to dip a toe into these dangerous waters until I had a clearer idea of the issues. But the basic operation of any testing system is "running THIS should yield THAT". Fortunately, there was a way I could punt on the problem. Since Agda development seems to be closely tied to the interactive Emacs mode, I could deliver my system as a set of Emacs commands; the actual testing could be done by normalising the expression under consideration and testing the normalised form for string-equality with the expected answer.

This was less easy than I'd expected; it turns out that agda-mode commands work by sending commands to a slave GHCi process, which generates elisp to insert the results into the appropriate buffer. I'm sure that agda-mode's authors had some rationale for this rather bizarre design, but it makes agda-mode a lot harder to build on than it could be. However (in close collaboration with Aaron Crane, who both contributed code directly and guided me through the gargantuan Emacs API like a Virgil to my Dante) I eventually succeeded. There are two ways to get our code:

darcs get http://patch-tag.com/r/pozorvlak/agda-test

git clone git://github.com/pozorvlak/agda-test.git

Then load the file agda-test.el in your .emacs in the usual way. Having done so, you can add tests to your Agda file by adding comments of the form

{- test TESTNAME: ACTUAL is EXPECTED; -}

For instance,

{- test 2+1: (suc (suc zero)) +N (suc zero) is (suc (suc (suc zero)));
   test 3+0: (suc (suc (suc (zero)))) +N zero is (suc (suc zero)); -}

When you then invoke the function agda2-test-all (via C-u C-c C-v for "verify"), you should be presented with a new buffer called *Agda test results*, containing the text

1..2
ok 1 - 2+1
not ok 2 - 3+0
    got 3
    expected 2

[ Except, er, I get "expected _175" for that last test instead. I don't think that's a bug in my elisp code, because I get the same result when I evaluate that expression manually with C-c C-n. Halp pls?]

You should recognise the output as the Test Anything Protocol; it should be easy to plug in existing tools for aggregating and displaying TAP results.

There are a lot of things I'd like to do in the future:

Add commands to ~~only run a single test~~, or just the tests in a given comment block, or a user-specified ad-hoc test group.
Highlight failing tests in the Agda buffer.
Allow the user to specify a TAP aggregator for the results.
Do something about the case where the test expressions don't compile cleanly.

If you'd like to send patches (which would be very welcome!), the darcs repository is currently considered the master one, so please submit patches there. I'm guessing that most Agda people are darcs users, and the slight additional friction of using darcs as my VCS is noise next to the friction of using Emacs as my editor :-) The git repository is currently just a mirror of the darcs one, but it would be easy to switch over to having it as the master if enough potential contributors would prefer git to darcs. Things might get confusing if I get patches submitted in both places, though :-)

Beware of the Train

Volárne číslo neexistuje

January 2018

Navigation

Page Summary

Style Credit

Expand Cut Tags

Unit testing Agda code, redux

Unit testing Agda code with elisp