January 2018

S M T W T F S
  123456
78910111213
14151617181920
21222324252627
28293031   

Style Credit

Expand Cut Tags

No cut tags
pozorvlak: (Hal)
Thursday, December 6th, 2012 11:41 pm

I've been running benchmarks again. The basic workflow is

  1. Create some number of directories containing the benchmark suites I want to run.
  2. Tweak the Makefiles so benchmarks are compiled and run with the compilers, simulators, libraries, flags, etc, that I care about.
  3. Optionally tweak the source code to (for instance) change the number of iterations the benchmarks are run for.
  4. Run the benchmarks!
  5. Check the output; discover that something is broken.
  6. Swear, fix the problem.
  7. Repeat until either you have enough data or the conference submission deadline gets too close and you are forced to reduce the scope of your experiments.
  8. Collate the outputs from the successful runs, and analyse them.
  9. Make encouraging noises as the graduate students do the hard work of actually writing the paper.

Suppose I want to benchmark three different simulators with two different compilers for three iteration counts. That's 18 configurations. Now note that the problem found in stage 5 and fixed in stage 6 will probably not be unique to one configuration - if it affects the invocation of one of the compilers then I'll want to propagate that change to nine configurations, for instance. If it affects the benchmarks themselves or the benchmark-invocation harness, it will need to be propagated to all of them. Sounds like this is a job for version control, right? And, of course, I've been using version control to help me with this; immediately after step 1 I check everything into Git, and then use git fetch and git merge to move changes between repositories. But this is still unpleasantly tedious and manual. For my last paper, I was comparing two different simulators with three iteration counts, and I organised this into three checkouts (x1, x10, x100), each with two branches (simulator1, simulator2). If I discovered a problem affecting simulator1, I'd fix it in, say, x1's simulator1 branch, then git pull the change into x10 and x100. When I discovered a problem affecting every configuration, I checked out the root commit of x1, fixed the bug in a new branch, then git merged that branch with the simulator1 and simulator2 branches, then git pulled those merges into x10 and x100.

Keeping track of what I'd done and what I needed to do was frankly too cognitively demanding, and I was constantly bedevilled by the sense that there had to be a Better Way. I asked about this on Twitter, and Ganesh Sittampalam suggested "use Darcs" - and you know, I think he's right, Darcs' "bag of commuting patches" model is a better fit to what I'm trying to do than Git's "DAG of snapshots" model. The obvious way to handle this in Darcs would be to have six base repositories, called "everything", "x1", "x10", "x100", "simulator1" and "simulator2"; and six working repositories, called "simulator2_x1", "simulator2_x10", "simulator2_x100", "simulator2_x1", "simulator2_x10" and "simulator2_x100". Then set up update scripts in each working repository, containing, for instance

#!/bin/sh
darcs pull ../base/everything
darcs pull ../base/simulator1
darcs pull ../base/x10
and every time you fix a bug, run for i in working/*; do $i/update; done.

But! It is extremely useful to be able to commit the output logs associated with a particular state of the build scripts, so you can say "wait, what went wrong when I used the -static flag? Oh yeah, that". I don't think Darcs handles that very well - or at least, it's not easy to retrieve any particular state of a Darcs repo. Git is great for that, but whenever I think about duplicating the setup described above in Git my mind recoils in horror before I can think through the details. Perhaps it shouldn't - would this work? Is there a Better Way that I'm not seeing?

pozorvlak: (Default)
Sunday, February 20th, 2011 09:49 pm
My post about unit-testing Agda code using elisp was posted to the dependent types subreddit (of which I was previously unaware). There, the commenter stevana pointed out that you could achieve the same effect using the following Agda code:
test1 : 2 + 1 == 3
test1 = refl

test2 : 3 + 0 == 3
test2 = refl
If your test fails, you'll get a compile-time error on that line. Aaaaaaaargh. I do wish someone had told me that before I spent a week mucking about with elisp. Well, that's not quite true - I learned a lot about Emacs, I had some fun writing the code, I learned some things about Agda that I might otherwise not have done, I got some code-review from [livejournal.com profile] aaroncrane, and I had a chance to try out using darcs and patch-tag for a collaborative project (the experience, I'm afraid, didn't measure up well against git and GitHub).

What I'd failed to realise, of course, is that the == type constructor describes computable equality: a == b is inhabited if the compiler can compute a proof that a is equal to b. "They both normalise to the same thing" is such a proof, and obviously one that the compiler's smart enough to look for on its own.

Ordinarily, a testcase is better than a compile-time assertion with the same meaning, because you can attach a debugger to a failing test and investigate exactly how your code is broken. This doesn't mean that types are stupid - types draw their power from their ability to summarise many different testcases. But in this case, I don't think there's much reason to prefer our approach to stevena's: the slight increase in concision that agda-test allows is balanced out by its inability to coerce both expressions to the same type, which often means the user has to type a lot more code. Stevena's approach does not suffer from this problem.

This also makes me more positive about dependent typing: if simple testcases can be incorporated into your type-system so easily, then maybe the "types are composable tests" slogan has some truth to it.

But seriously, guys, macros.
pozorvlak: (Default)
Friday, February 4th, 2011 02:11 pm
On Monday I went to the first day of Conor McBride's course Introduction to Dependently-Typed programming in Agda. "What's dependently-typed programming?" you ask. Well, when a compiler type-checks your program, it's actually (in a sense which can be made precise) proving theorems about your code. Assignments of types to expressions correspond to proofs of statements in a formal logical language; the precise logical language in which these statements are expressed is determined by your type system (union types correspond to "or", function types correspond to "if... then", that kind of thing). This correspondence goes by the fancy name of "the Curry-Howard isomorphism" in functional programming and type-theory circles. In traditional statically-typed languages these theorems are mostly pretty uninteresting, but by extending your type system so it corresponds to a richer logical language you can start to state and prove some more interesting theorems by expressing them as types, guaranteeing deep properties of your program statically. This is the idea behind dependent typing. A nice corollary of their approach is that types in dependently-typed languages (such as Agda, the language of the course) can be parametrised by values (and not just by other types, as in Haskell), so you can play many of the same type-level metaprogramming games as in C++ and Ada, but in a hopefully less crack-fuelled way. I spent a bit of time last year playing around with Edwin Brady's dependently-typed systems language Idris, but found the dependent-typing paradigm hard to wrap my head around. So I was very pleased when Conor's course was announced.

The course is 50% lab-based, and in these lab sessions I realised something important: fancy type-system or no, I need to test my code, particularly when I'm working in such an unfamiliar language. Dependent typing may be all about correctness by construction, but I'm not (yet?) enlightened enough to work that way - I need to see what results my code actually produces. I asked Conor if there were any way to evaluate individual Agda expressions, and he pointed me at the "Evaluate term to normal form" command in Emacs' Agda mode (which revealed that I had, indeed, managed to produce several incorrect but well-typed programs). Now, that's better than nothing, but it's clearly inadequate as a testing system - you have to type the expression to evaluate every time, and check the result by eye. I asked Conor if Agda had a more extensive unit-testing framework, and he replied "I'm not aware of such a thing. The culture is more 'correctness by construction'. Testing is still valuable."

So I wrote one.

I've written - or at least hacked on - a test framework of some sort at almost every programming job I've had (though these days I try to lean on Test::More and friends as much as possible). It gets hella tedious. This one was a bit different, though. One message that came out of Conor's lectures was that the correct way to represent equality in dependently-typed languages is somewhat controversial; as a beginner, I didn't want to dip a toe into these dangerous waters until I had a clearer idea of the issues. But the basic operation of any testing system is "running THIS should yield THAT". Fortunately, there was a way I could punt on the problem. Since Agda development seems to be closely tied to the interactive Emacs mode, I could deliver my system as a set of Emacs commands; the actual testing could be done by normalising the expression under consideration and testing the normalised form for string-equality with the expected answer.

This was less easy than I'd expected; it turns out that agda-mode commands work by sending commands to a slave GHCi process, which generates elisp to insert the results into the appropriate buffer. I'm sure that agda-mode's authors had some rationale for this rather bizarre design, but it makes agda-mode a lot harder to build on than it could be. However (in close collaboration with Aaron Crane, who both contributed code directly and guided me through the gargantuan Emacs API like a Virgil to my Dante) I eventually succeeded. There are two ways to get our code:
darcs get http://patch-tag.com/r/pozorvlak/agda-test
or
git clone git://github.com/pozorvlak/agda-test.git
Then load the file agda-test.el in your .emacs in the usual way. Having done so, you can add tests to your Agda file by adding comments of the form
{- test TESTNAME: ACTUAL is EXPECTED; -}
For instance,
{- test 2+1: (suc (suc zero)) +N (suc zero) is (suc (suc (suc zero)));
   test 3+0: (suc (suc (suc (zero)))) +N zero is (suc (suc zero)); -}
When you then invoke the function agda2-test-all (via C-u C-c C-v for "verify"), you should be presented with a new buffer called *Agda test results*, containing the text
1..2
ok 1 - 2+1
not ok 2 - 3+0
    got 3
    expected 2
[ Except, er, I get "expected _175" for that last test instead. I don't think that's a bug in my elisp code, because I get the same result when I evaluate that expression manually with C-c C-n. Halp pls?]

You should recognise the output as the Test Anything Protocol; it should be easy to plug in existing tools for aggregating and displaying TAP results.

There are a lot of things I'd like to do in the future:
  • Add commands to only run a single test, or just the tests in a given comment block, or a user-specified ad-hoc test group.
  • Highlight failing tests in the Agda buffer.
  • Allow the user to specify a TAP aggregator for the results.
  • Do something about the case where the test expressions don't compile cleanly.
If you'd like to send patches (which would be very welcome!), the darcs repository is currently considered the master one, so please submit patches there. I'm guessing that most Agda people are darcs users, and the slight additional friction of using darcs as my VCS is noise next to the friction of using Emacs as my editor :-) The git repository is currently just a mirror of the darcs one, but it would be easy to switch over to having it as the master if enough potential contributors would prefer git to darcs. Things might get confusing if I get patches submitted in both places, though :-)
pozorvlak: (Default)
Wednesday, December 1st, 2010 03:03 pm
After my git/darcs talk, some of the folks on darcs-users were kind enough to offer constructive criticism. In particular, Stephen Turnbull mentioned an interesting use-case which I want to discuss further.

As I tried to stress, the key insight required to translate between git-think and darcs-think is
In git, the natural thing to pull is a branch (an ordered list of commits, each of which is assumed to depend on those before it); in darcs, the natural thing to pull is a patch (a named change whose dependencies are calculated optimistically by the system).
Stephen's use-case is this: you're a release manager, and one of your hackers has written some code you want to pull in. However, you don't want all their code. Suppose the change is nicely isolated into a single commit. In darcs, you can pull in just that commit (and a minimal set of prior commits required for it to apply cleanly). This is as far as my thinking had got, but Stephen points out that the interesting part of the story is what happens next: if you subsequently pull in other changes that depend on that commit, then darcs will note that it's already in your repository and all will be well. This is true in git if the developer has helpfully isolated that change into a branch: you can pull that branch, and subsequent merges will take account of the fact that you've done so. However, if the developer hasn't been so considerate, then you're potentially in trouble: you can cherry-pick that commit (creating a new commit with the same effect), but if you subsequently pull a branch containing it then git will not take account of your having cherry-picked it earlier. If either of you have changed any of the lines affected by that diff, then you'll get conflicts.

Thinking about this further, this means that I was even righter than I realised. In the git view of the world, the fact that that commit is not in its own branch is an assertion that it only makes sense in the context of the rest of the branch. Attempting to pull it in on its own is therefore not useful. You can do it, of course - it's Unix git, you can do anything - but you're making a rod for your own back. As I tried to emphasize in the talk, git-cherry-pick is a low-level, hackish tool, only really intended for use in drastic situations or in the privacy of your own local repo. If you want something semantically meaningful, only pull branches.

Git-using release managers, therefore, have to rely on developers to package atomic features sensibly into branches. If your developers can't be trusted to do this, you may have a problem. But note that darcs has the dual problem: if you can't trust your developers to specify semantic (non-textual) dependencies with darcs commit --ask-deps, then you're potentially going to be spending a lot of time tracking down semantic dependencies by hand. Having been a release manager under neither system, I don't have any intuition for which is worse - can anyone here shed any light?

[The cynic in me suggests that any benefits to the Darcs approach would only become apparent in projects which are large enough to rule out the use of Darcs with its current performance, but I could, as ever, be completely wrong. And besides, not being very useful right now doesn't rule out its ultimately proving to be a superior solution.]

On another note: Mark Stosberg (who's written quite a lot on the differences between darcs and git himself) confirms that people actually do use spontaneous branches, with ticket numbers as the "branch" identifiers. Which got me thinking. Any git user can see that spontaneous branches are more work for the user than real branches, because you have to remember your ticket number and type it into your commit message every time. Does that sound like a trivial amount of work to complain about? That's because you have no idea how easy branching and merging is in git. But it's also work that can be automated away with some tool support. Stick a file somewhere in _darcs containing the name of the current ticket, and somehow prepend that to your commit messages. I have just written a Perl script to do that (GitHub repo, share and enjoy).

Now we just need the ability to easily back-out and restore incomplete tickets without creating a whole new repo, and they'll be as convenient as git topic branches :-)
pozorvlak: (Default)
Friday, November 12th, 2010 10:37 am
Last night was Glasgow.pm's second technical meeting, and I finally gave a version of the DVCS-comparison talk I've been thinking about doing for at least a year. I could have done with a lot more rehearsal, something that would have been helped by finishing off the slides before (checks git-log) 5.33pm - the meeting started at 6.15 - but I think it went OK.

The idea of the talk was to
  • explain a little bit of what git's doing under the hood
  • explain how the darcs model differs
  • cut through some of the tangle of conflicting terminology
  • explain why git users think branching's such a big deal, and why darcs users think that cherry-picking's such a big deal (spoilers: the answers are the same!).
I didn't try to explain the details of how you use either system, because in both cases that's fairly easy to work out once you have a decent handle on what's going on under the hood. The audience was mostly full of git users, so I spent more time explaining the darcs model; hopefully darcs users wanting to learn git will also find the slides helpful. For a more detailed introduction to how git works, I strongly recommend mjd's recent talk on the subject, and for more on the (much ameliorated) "exponential merge" problem in darcs see here. Edit: and for the details of how to use git from a darcsy perspective, try the GHC wiki's git for darcs users page.

By the way, there's a useful consequence of git's design which neither mjd's slides nor mine mention, but which I mentioned in the talk, namely the reflog. It's a list of every commit you've visited, and when you were there. This means you can say things like "Show me the state of the repository at 11.15 last Monday"; more usefully, it lets you track down and recover commits that have been orphaned by some botched attempt at history rewriting. This is not a feature that you need often, but when you do need it it's an absolute lifesaver. Git's "directed graph of snapshots" model makes this feature almost trivial to add (and because git's built on a content-addressable filesystem, jumping to those orphaned commits is fast), but darcs' "bag of patches" model makes it much harder to add such a feature (though they're thinking about possible approaches that make more sense than storing your darcs repo in git).

Thanks very much to Eric Kow and Guillaume Hoffman for answering my questions about darcs. Any errors remaining are of course my own.

Anyway, you can get the slides here (slightly cleaned-up). Please let me know if they don't make sense on their own. Or, if you really care, you can browse the history here. To build the TeX source, you'll need the prosper, xypic and graphicsx packages installed.

Edit: some of the people on darcs-users were kind enough to comment, prompting some further thought. I've written a followup post in which I respond to some of the things they said.