pozorvlak: (Default)
Wednesday, October 18th, 2017 07:09 pm
I am only about a hundred pages into this book, but it had paid for itself after fifty. I wish I'd read it ten years ago. Actually, I wish I'd read it nineteen years ago when I started my first programming job, but it hadn't been written then. Techies, if you haven't read it, I strongly advise you to do so at your earliest convenience. It's about how to deal with a Catch-22 that's come up over and over again in my programming career:
  • I can't safely modify, or even understand, this code, because it has no tests.
  • I can't test it without modifying it.
Feathers describes techniques for bringing code under test with the minimal amount of disruption, then refactoring it towards maintainability. Some of the advice I'd worked out for myself, but having names and a structure to hang ad-hoc insights on is great. The book concentrates on object-oriented and procedural languages, but a lot of the techniques should generalise to other paradigms.

Check out this book on Goodreads: https://www.goodreads.com/book/show/44919.Working_Effectively_with_Legacy_Code

Working Effectively with Legacy Code by Michael Feathers
pozorvlak: (Default)
Monday, August 19th, 2013 09:10 pm
If I ever design a first course in compilers, I'll do it backwards: we'll start with code generation, move on to static analysis, then parse a token stream and finally lex source text. As we go along, students will be required to implement a small compiler for a simple language, using the techniques they've just learned. The (excellent) Stanford/Coursera compilers course does something similar, but they proceed in the opposite direction, following the dataflow in the final compiler: first they cover lexing, then parsing, then syntax analysis, then codegen. The first Edinburgh compilers course follows roughly the same plan of lectures, and I expect many other universities' courses do too.

I think a backwards course would work better for two reasons:
  1. Halfway through the Stanford course, you have a program that can convert source text into an intermediate representation with which you can't do very much. Halfway through the backwards course, you'd have a compiler for an unfriendly source language: you could write programs directly in whatever level of IR you'd got to (I'm assuming a sensible implementation language that doesn't make entering data literals too hard), and compile them using code you'd written to native code. I think that would be pretty motivating.
  2. When I did the Stanford course, all the really good learning experiences were in the back end. Writing a Yacc parser was a fiddly but largely straightforward experience; writing the code generator taught me loads about how your HLL code is actually executed and how runtime systems work. I also learned some less obvious things like the value of formal language specifications¹. Most CS students won't grow up to be compiler hackers, but they will find it invaluable to have a good idea of what production compilers do to their HLL code; it'll be much more useful than knowing about all the different types of parsing techniques, anyway². Students will drop out halfway through the course, and even those who make it all the way through will be tired and stressed by the end and will thus take in less than they did at the beginning: this argues for front-loading the important material.
What am I missing?

¹ I learned this the hard way. I largely ignored the formal part of the spec when writing my code generator, relying instead on the informal description; then towards the end of the allocated period I took a closer look at it and realised that it provided simple answers to all the thorny issues I'd been struggling with.
² The answer to "how should I write this parser?" in an industrial context is usually either "with a parser generator" or "recursive descent". LALR parsers such as those produced by Yacc are a pain to debug if you don't understand the underlying theory, true, but that's IMHO an argument for using a parser generator based on some other algorithm, most of which are more forgiving.
pozorvlak: (Default)
Sunday, June 2nd, 2013 09:15 pm
I've been learning about the NoSQL database CouchDB, mainly from the Definitive Guide, but also from the Coursera Introduction to Data Science course and through an informative chat with [personal profile] necaris, who has used it extensively at Esplorio. The current draft of the Definitive Guide is rather out-of-date and has several long-open pull requests on GitHub, which doesn't exactly inspire confidence, but CouchDB itself appears to be actively maintained. I have yet to use CouchDB in anger, but here's what I've learned so far:

  • CouchDB is, at its core, an HTTP server providing append-only access to B-trees of versioned JSON objects via a RESTful interface. Say what now? Well, you store your data as JavaScript-like objects (which allow you to nest arrays and hash tables freely); each object is indexed by a key; you access existing objects and insert new ones using the standard HTTP GET, PUT and DELETE methods, specifying and receiving data in JavaScript Object Notation; you can't update objects, only replace them with new objects with the same key and a higher version number; and it's cheap to request all the objects with keys in a given range.

  • The JSON is not by default required to conform to any particular schema, but you can add validation functions to be called every time data is added to the database. These will reject improperly-formed data.

  • CouchDB is at pains to be RESTful, to emit proper cache-invalidation data, and so on, and this is key to scaling it out: put a contiguous subset of (a consistent hash of) the keyspace on each machine, and build a tree of reverse HTTP proxies (possibly caching ones) in front of your database cluster.

  • CouchDB's killer feature is probably master-to-master replication: if you want to do DB operations on a machine that's sometimes disconnected from the rest of the cluster (a mobile device, say), then you can do so, and sync changes up and down when you reconnect. Conflicts are flagged but not resolved by default; you can resolve them manually or automatically by recording a new version of the conflicted object. Replication is also used for load-balancing, failover and scaling out: you can maintain one or more machines that constantly replicate the master server for a section of keyspace, and you can replicate only a subset of keyspace onto a new database when you need to expand.

  • CouchDB doesn't guarantee to preserve all the history of an object, and in particular replications only seem to send the most recent version; I think this precludes Git-style three-way merge from the conflicting versions' most recent common ancestor (and forget about Darcs-style full-history merging!).

  • The cluster-management story isn't as good as for some other systems, but there are a couple of PaaS offerings.

  • Queries/views and non-primary indexes are both handled using map/reduce. If you want to index on something other than the primary key - posts by date, say - then you write a map query which emits (date, post) pairs. These are put into another B-tree, which is stored on disk; clever things are done to mark subtrees invalid as new data comes in, and changes to the query result or index are calculated lazily. Since indices are stored as B-trees, it's cheap to get all the objects within a given range of secondary keys: all posts in February, for instance.

  • CouchDB's reduce functions are crippled: attempting to calculate anything that isn't a scalar or a fixed-size object is considered Bad Form, and may cause your machine(s) to thrash. AFAICT you can't reduce results from different machines by this mechanism: CouchDB Lounge requires you to write extra merge functions in Twisted Python.

  • Map, reduce and validation functions (and various others, see below) are by default written in JavaScript. But CouchDB invokes an external interpreter for them, so it's easy to extend CouchDB with a new query server. Several such have been written, and it's now possible to write your functions in many different languages.

  • There's a very limited SQL view engine, but AFAICT nothing like Hive or Pig that can take a complex query and compile it down into a number of chained map/reduce jobs. The aforementioned restrictions on reduce functions mean that the strategy I've been taught for expressing joins as map/reduce jobs won't work; I don't know if this limitation is fundamental. But it's IME pretty rare to require general joins in applications: usually you want to do some filtering or summarisation on at least one side.

  • CouchDB can't quite make up its mind whether it wants to be a database or a Web application framework. It comes by default with an administration web app called Futon; you can also use it to store and execute code for rendering objects as HTML, Atom, etc. Such code (along with views, validations etc) is stored in special JSON objects called "design documents": best practice is apparently to have one design document for each application that needs to access the underlying data. Since design documents are ordinary JSON objects, they are propagated between nodes by replications.

  • However, various standard webapp-framework bits are missing, notably URL routing. But hey, you can always use mod_rewrite...

  • There's a tool called Erica (and an older one called CouchApp) which allows you to sync design documents with more conventional source-code directories in your filesystem.

  • CouchDB is written in Erlang, and the functional-programming influence shows up in other places: most types of user-defined function are required to be free of side-effects, for instance. Then there's the aforementioned uses of lazy evaluation and the append-only nature of the system as a whole. You can extend it with your own Erlang code or embed it into an Erlang application, bypassing the need for HTTP requests.

tl;dr if you've ever thought "data modelling and synchronisation are hard, let's just stick a load of JSON files in Git" (as I have, on several occasions), then CouchDB is probably a good fit to your needs. Especially if your analytics needs aren't too complicated.
pozorvlak: (Hal)
Thursday, January 24th, 2013 09:59 pm
[Wherein we review an academic conference in the High/Low/Crush/Goal/Bane format used for reviewing juggling conventions on rec.juggling.]

High: My old Codeplay colleague Ally Donaldson's FAT-GPU workshop. He was talking about his GPUVerify system, which takes CUDA or OpenCL programs and either proves them free of data races and synchronisation-barrier conflicts, or finds a potential bug. It's based on an SMT solver; I think there's a lot of scope to apply constraint solvers to problems in compilation and embedded system design, and I'd like to learn more about them.

Also, getting to see the hotel's giant fishtank being cleaned, by scuba divers.

Low: My personal low point was telling a colleague about some of the problems my depression has been causing me, and having him laugh in my face - he'd been drinking, and thought I was exaggerating for comic effect. He immediately apologised when I told him that this wasn't the case, but still, not fun. The academic low point was the "current challenges in supercomputing" tutorial, which turned out to be a thinly-disguised sales pitch for the sponsor's FPGA cards. That tends not to happen at maths conferences...

Crush: am I allowed to have a crush on software? Because the benchmarking and visualisation infrastructure surrounding the Sniper x86 simulator looks so freaking cool. If I can throw away the mess of Makefiles, autoconf and R that serves the same role in our lab I will be very, very happy.

Goal: Go climbing on the Humboldthain Flakturm (fail - it turns out that Central Europe is quite cold in January, and nobody else fancied climbing on concrete at -7C). Get my various Coursera homeworks and bureaucratic form-filling done (fail - damn you, tasty German beer and hyperbolic discounting!). Meet up with [livejournal.com profile] maradydd, who was also in town (fail - comms and scheduling issues conspired against us. Next time, hopefully). See some interesting talks, and improve my general knowledge of the field (success!).

Bane: I was sharing a room with my Greek colleague Chris, who had a paper deadline on the Wednesday. This meant he was often up all night, and went to bed as I was getting up, so every trip into the room to get something was complicated by the presence of a sleeping person. He also kept turning the heating up until it was too hot for me to sleep. Dually, of course, he had to share his room with a crazy Brit who kept getting up as he was going to bed and opening the window to let freezing air in...
pozorvlak: (Hal)
Sunday, December 9th, 2012 09:17 pm
I've been using Mercurial (also known as hg) as the version-control system for a project at work. I'd heard good things about it - a Git-like system with a cleaner UI and better documentation - and was glad of the excuse to try it out. Unfortunately, I was disappointed by what I found. The docs are good, and the UI's a bit cleaner, but it's still got some odd quirks - the difference between hg resolve and hg resolve -m catches me every bloody time, for instance. Unlike Git, you aren't prompted to set missing configuration options interactively. Some of the defaults are crazy, like not sending long output to a pager. And having got used to easy, safe history-rewriting in Git, I was horrified to learn that Mercurial offered no such guarantees of safety: up until version 2.2, the equivalent of a simple commit --amend could cause you to lose work permanently. Easy history-rewriting is a big deal; it means that you never have to choose between committing frequently and only pushing easily-reviewable history.

But I persevered, and with a bit of configuration I was able to make hg more like Git more comfortable. Here's my current .hgrc:
[ui]
username = Pozorvlak <pozorvlak@example.com>
merge = internal:merge
[pager]
pager = LESS='FSRX' less
[extensions]
rebase =
record =
histedit = ~/usr/etc/hg/hg_histedit.py
fetch =
shelve = ~/usr/etc/hg/hgshelve.py
pager =
mq =
color =

You'll need at least the username line, because of the aforementioned lack of interactive configuration. The pager = LESS='FSRX' less and pager = lines send long output to less instead of letting it all spew out and overflow your console scrollback buffer. merge = internal:merge tells it to use its internal merge algorithm as a merge tool, and put ">>>>" gubbins in files in the event of conflicts. Otherwise it uses meld for merges on my machine; meld is very pretty but not history-aware, and history-aware merges are at least 50% of the point of using a DVCS in the first place. The rebase extension allows you to graft a sequence of changesets onto another part of the history graph, like git rebase; the record extension allows you to select only some of the changes in your working copy for committing, like git add -p or darcs record; the fetch extension lets you do pull-and-merge in one operation - confusingly, git pull and git fetch are the opposite way round from hg fetch and hg pull. The mq extension turns on patch queues, which I needed for some hairy operation or other once. The non-standard histedit extension works like git rebase --interactive but not, I believe, as safely - dropped commits are deleted from the history graph entirely rather than becoming unreachable from an active head. The non-standard shelve extension works like git stash, though less conveniently - once you've shelved one change you need to give a name to all subsequent ones. Perhaps a Mercurial expert reading this can tell me how to delete unwanted shelves? Or about some better extensions or settings I should be using?
pozorvlak: (Hal)
Thursday, December 6th, 2012 11:41 pm

I've been running benchmarks again. The basic workflow is

  1. Create some number of directories containing the benchmark suites I want to run.
  2. Tweak the Makefiles so benchmarks are compiled and run with the compilers, simulators, libraries, flags, etc, that I care about.
  3. Optionally tweak the source code to (for instance) change the number of iterations the benchmarks are run for.
  4. Run the benchmarks!
  5. Check the output; discover that something is broken.
  6. Swear, fix the problem.
  7. Repeat until either you have enough data or the conference submission deadline gets too close and you are forced to reduce the scope of your experiments.
  8. Collate the outputs from the successful runs, and analyse them.
  9. Make encouraging noises as the graduate students do the hard work of actually writing the paper.

Suppose I want to benchmark three different simulators with two different compilers for three iteration counts. That's 18 configurations. Now note that the problem found in stage 5 and fixed in stage 6 will probably not be unique to one configuration - if it affects the invocation of one of the compilers then I'll want to propagate that change to nine configurations, for instance. If it affects the benchmarks themselves or the benchmark-invocation harness, it will need to be propagated to all of them. Sounds like this is a job for version control, right? And, of course, I've been using version control to help me with this; immediately after step 1 I check everything into Git, and then use git fetch and git merge to move changes between repositories. But this is still unpleasantly tedious and manual. For my last paper, I was comparing two different simulators with three iteration counts, and I organised this into three checkouts (x1, x10, x100), each with two branches (simulator1, simulator2). If I discovered a problem affecting simulator1, I'd fix it in, say, x1's simulator1 branch, then git pull the change into x10 and x100. When I discovered a problem affecting every configuration, I checked out the root commit of x1, fixed the bug in a new branch, then git merged that branch with the simulator1 and simulator2 branches, then git pulled those merges into x10 and x100.

Keeping track of what I'd done and what I needed to do was frankly too cognitively demanding, and I was constantly bedevilled by the sense that there had to be a Better Way. I asked about this on Twitter, and Ganesh Sittampalam suggested "use Darcs" - and you know, I think he's right, Darcs' "bag of commuting patches" model is a better fit to what I'm trying to do than Git's "DAG of snapshots" model. The obvious way to handle this in Darcs would be to have six base repositories, called "everything", "x1", "x10", "x100", "simulator1" and "simulator2"; and six working repositories, called "simulator2_x1", "simulator2_x10", "simulator2_x100", "simulator2_x1", "simulator2_x10" and "simulator2_x100". Then set up update scripts in each working repository, containing, for instance

#!/bin/sh
darcs pull ../base/everything
darcs pull ../base/simulator1
darcs pull ../base/x10
and every time you fix a bug, run for i in working/*; do $i/update; done.

But! It is extremely useful to be able to commit the output logs associated with a particular state of the build scripts, so you can say "wait, what went wrong when I used the -static flag? Oh yeah, that". I don't think Darcs handles that very well - or at least, it's not easy to retrieve any particular state of a Darcs repo. Git is great for that, but whenever I think about duplicating the setup described above in Git my mind recoils in horror before I can think through the details. Perhaps it shouldn't - would this work? Is there a Better Way that I'm not seeing?

pozorvlak: (Hal)
Thursday, December 6th, 2012 09:45 pm
Inspired by Falsehoods Programmers Believe About Names, Falsehoods Programmers Believe About Time, and far, far too much time spent fighting autotools. Thanks to Aaron Crane, [livejournal.com profile] totherme and [livejournal.com profile] zeecat for their comments on earlier versions.

It is accepted by all decent people that Make sucks and needs to die, and that autotools needs to be shot, decapitated, staked through the heart and finally buried at a crossroads at midnight in a coffin full of millet. Hence, there are approximately a million and seven tools that aim to replace Make and/or autotools. Unfortunately, all of the Make-replacements I am aware of copy one or more of Make's mistakes, and many of them make new and exciting mistakes of their own.

I want to see an end to Make in my lifetime. As a service to the Make-replacement community, therefore, I present the following list of tempting but incorrect assumptions various build tools make about building software.

All of the following are wrong:
  • Build graphs are trees.
  • Build graphs are acyclic.
  • Every build step updates at most one file.
  • Every build step updates at least one file.
  • Compilers will always modify the timestamps on every file they are expected to output.
  • It's possible to tell the compiler which file to write its output to.
  • It's possible to tell the compiler which directory to write its output to.
  • It's possible to predict in advance which files the compiler will update.
  • It's possible to narrow down the set of possibly-updated files to a small hand-enumerated set.
  • It's possible to determine the dependencies of a target without building it.
  • Targets do not depend on the rules used to build them.
  • Targets depend on every rule in the whole build system.
  • Detecting changes via file hashes is always the right thing.
  • Detecting changes via file hashes is never the right thing.
  • Nobody will ever want to rebuild a subset of the available dirty targets.
  • People will only want to build software on Linux.
  • People will only want to build software on a Unix derivative.
  • Nobody will want to build software on Windows.
  • People will only want to build software on Windows.
    (Thanks to David MacIver for spotting this omission.)
  • Nobody will want to build on a system without strace or some equivalent.
  • stat is slow on modern filesystems.
  • Non-experts can reliably write portable shell script.
  • Your build tool is a great opportunity to invent a whole new language.
  • Said language does not need to be a full-featured programming language.
  • In particular, said language does not need a module system more sophisticated than #include.
  • Said language should be based on textual expansion.
  • Adding an Nth layer of textual expansion will fix the problems of the preceding N-1 layers.
  • Single-character magic variables are a good idea in a language that most programmers will rarely use.
  • System libraries and globally-installed tools never change.
  • Version numbers of system libraries and globally-installed tools only ever increase.
  • It's totally OK to spend over four hours calculating how much of a 25-minute build you should do.
  • All the code you will ever need to compile is written in precisely one language.
  • Everything lives in a single repository.
  • Files only ever get updated with timestamps by a single machine.
  • Version control systems will always update the timestamp on a file.
  • Version control systems will never update the timestamp on a file.
  • Version control systems will never change the time to one earlier than the previous timestamp.
  • Programmers don't want a system for writing build scripts; they want a system for writing systems that write build scripts.

[Exercise for the reader: which build tools make which assumptions, and which compilers violate them?]

pozorvlak: (Default)
Thursday, September 13th, 2012 04:01 pm
I've recently submitted a couple of talk proposals to upcoming conferences. Here are the abstracts.

Machine learning in (without loss of generality) Perl

London Perl Workshop, Saturday 24th November 2012. 25 minutes.

If you read a book or take a course on machine learning, you'll probably spend a lot of time learning about how to implement standard algorithms like k-nearest neighbours or Naive Bayes. That's all very interesting, but we're Perl programmers - all that stuff's on CPAN already. This talk will focus on how to use those algorithms to attack problems, how to select the best ML algorithm for your task, and how to measure and improve the performance of your machine learning system. Code samples will be in Perl, but most of what I'll say will be applicable to machine learning in any language.

Classifying Surfaces

MathsJam: The Annual Conference, 17th-18th November 2012. 5 minutes.

You may already know Euler's remarkable result that if a polyhedron has V vertices, E edges and F faces, then V - E + F = 2. This is a special case of the beautiful classification theorem for closed surfaces. I will state this classification theorem, and give a quick sketch of a proof.
pozorvlak: (Default)
Thursday, December 29th, 2011 12:08 pm
I've been writing some Perl code recently, and a couple of ideas have occurred to me.
  1. Closures are great when you have several behaviours which can all vary independently; objects are great when you have several behaviours that must all vary in sync with one another. Put another way: if your problem is combinatorial explosion of objects, try using closures instead; if your problem is ensuring consistency among different behaviours, try grouping them into objects. Languages like Perl make it relatively easy to do either, so use the best tool for the job. The trick is to spot that you're using the wrong representation before you either create a kajillion Strategy classes or knock together a half-arsed object system out of hashes of closures.
  2. In the correct quantity, housekeeping tasks can be your friends. By "housekeeping tasks" I mean tasks that don't directly contribute to solving the problem at hand, but which still add some value. An example from yesterday was converting a small class hierarchy to use the Moose object system. Other examples might be writing per-method documentation, cleaning up your version control history, and minor refactorings. If there's too much of this stuff to do, it can get dispiriting - you want to be solving the problem, not mucking about with trivia! But if there's not too much, it can be helpful: you have something to do while you're stuck on the main problem, and the housekeeping work is close enough to the main problem that it stays in your brain's L2 cache, where a background process can work away at it. If you're so stuck that you literally can't make any progress, your choices are (a) think very very hard and get depressed, (b) go and do something totally different, in the process forgetting lots of important details. I find both of these to be less productive.

[If you're interested, my code is of course on GitHub: some fixes to the CPAN module List::Priority, and some code for benchmarking all the priority queues on CPAN. Any suggestions or patches would be very welcome!]
pozorvlak: (Default)
Wednesday, October 26th, 2011 06:55 pm
Man, maintaining our Makefiles by hand really sucks. I think I'm going to write a Makefile generator. Do you think that's a good idea?

Certainly not! Go and wash your computer out with soap.

But then my computer will be ruined, and I won't be able to code!

Yes. That's the idea.
pozorvlak: (Default)
Tuesday, July 26th, 2011 10:31 am
Back in 2003 when I was working for $hateful_defence_contractor, we were dealing with a lot of quantities expressed in dB. Occasionally there was a need to add these things - no, I can't remember why. Total power output from multiple sources, or something. Everyone cursed about this. So I wrote a desk calculator script, along these lines:
#!/usr/bin/perl

print "> ";
while (<>) {
   s|-?\d+(\.\d+)?([eE][-+]?\d+)?|10**($&/10)|oeg;
   print 10*log(eval $_)/log(10)."\n> ";
}
I've always thought of this as The Most Evil Code I've Ever Written. For those of you who don't speak fluent regex, it reads a line of input from the user, interprets everything that looks like a number as a number of decibels, replaces each decibel-number with the non-decibel equivalent, evaluates the resulting string as Perl code, and then converts the result back into decibels. Here's an example session:
> 1 + 1
4.01029995663982
> 10 * 10 
20
> cos(1)
-5.13088257108395
Some of you are no doubt thinking "Well of course that code's evil, it's written in Perl!" But no. Here's the same algorithm written in Python:
#!/usr/bin/python -u
import re, math, sys

def repl(match):
        num = float(match.group(0))
        return str(10**(num/10))

number = re.compile(r'-?\d+(\.\d+)?([eE][-+]?\d+)?')
while 1:
        line = sys.stdin.readline()
        if len(line) == 0:
                break
        line = re.sub(number, repl, line)
        print 10*math.log10(eval(line))
If anything, the Perl version is simpler and has lower accidental complexity. If Perl's not the best imaginable language for expressing that algorithm, it's damn close.

[I also tried to write a Haskell version using System.Eval.Haskell, but I got undefined reference to `__stginit_pluginszm1zi5zi1zi4_SystemziEvalziHaskell_' errors, suggesting my installation of cabal is b0rked. Anyone know what I need to do to fix it? Also, I'm sure my Python style can be greatly improved - suggestions most welcome.]

No, I thought of it as evil because it's doing an ugly thing: superficially altering code with regexes and then using string eval? And who the hell adds decibels, anyway?

Needless to say, it was the most successful piece of code I wrote in the year I spent in that job.

I was talking about string eval to Aaron Crane the other day, and I mentioned this program. His response surprised me:
I disagree; I think it’s a lovely piece of code. It may not be a beautiful jewel of elegant abstractions for a complex data model, true. But it’s small, simple, trivial to write, works on anything with a Perl interpreter (of pretty much any vintage, and with no additional dependencies), and clearly correct once you’ve thought about the nature of the arithmetic you’re doing. While it’s not something you’d ship as part of a safety-critical system, for example, I can’t see any way it could be realistically improved as an internal tool, aimed at users who are aware of its potential limitations.
[Full disclosure: the Perl version above didn't work first time. But the bug was quickly found, and it did work the second time :-)]

The lack of external dependencies (also a virtue of the Python version, which depends only on core modules) was very much intentional: I wrote my program so it could be trivially distributed (by samizdat, if necessary). Most of my colleagues weren't Perl programmers, and if I'd said "First, install Regexp::Common from CPAN...", I'd have lost half my potential audience. As it was, the tool was enthusiastically adopted.

So, what do you all think? Is it evil or lovely? Or both? And what's the most evil piece of code that you've written?

Edit: Aaron also pointed me at this program, which is both lovely and evil in a similar way. If you don't understand what's going on, type
perl -e 'print join q[{,-,+}], 1..9'
and
perl -e 'print glob "1{,-,+}2"'
at the command-line.
pozorvlak: (Default)
Saturday, July 2nd, 2011 06:37 pm
I'm currently running a lot of benchmarks in my day job, in the hope of perhaps collecting some useful data in time for an upcoming paper submission deadline - this is the "science" part of "computer science". Since getting a given benchmark suite built and running is often needlessly complex and tedious, one of my colleagues has written an abstraction layer in the form of a load of Makefiles. By issuing commands like "make build-eembc2", "make run-utdsp" or "make distclean-dspstone" you can issue the correct command (build/run/distclean) to whichever benchmark suite you care about. The lists of individual benchmarks are contained in .mk files, so you can strip out any particular benchmark you're not interested in.

I want to use benchmark runs as part of the fitness function for a genetic algorithm, so it's important that it run fast, and simulating another processor (as we're doing) is inherently a slow business. Fortunately, benchmark suites consist of lots of small programs, which can be run in parallel if you don't care about measuring wallclock seconds. And make already has support for parallel builds, using the -j option.

But it's always worth measuring these things, so I copied the benchmark code up onto our multi-core number crunching machine, and did two runs-from-clean with and without the -j flag. No speedup. Checking top, I found that only one copy of the simulator or compiler was ever running at a time. What the hell? Time to look at the code:
TARGETS=build run collect clean distclean

%-eembc2: eembc-2.0
        @for dir in $(BMARKS_EEMBC2) ; do \
          if test -d eembc-2.0/$$dir ; then \
            ${MAKE} -C eembc-2.0/$$dir $* ; \
          fi; \
        done
Oh God. Dear colleague, you appear to have taken a DSL explicitly designed to provide parallel tracking of dependencies, and then deliberately thrown that parallelism away. What were you thinking?¹ But it turns out that Dominus' Razor applies here, because getting the desired effect without sacrificing parallelism is actually remarkably hard... )

Doing it in redo instead )

Time to start teaching my colleagues about redo? I think it might be...

¹ He's also using recursive make, which means we're doing too much work if there's much code shared between different benchmarks. But since the time taken to run a benchmark is utterly dominated by simulator time, I'm not too worried about that.
pozorvlak: (Default)
Monday, May 30th, 2011 06:44 pm
I have just written the following commit message:
"Branch over unconditional jump" hack for arbitrary-length brcc.
    
 - brcc (branch and compare) instructions can have static branch-prediction
   hints, but can only jump a limited distance.
 - Calculating the distance of a jump at expand-time is hard.
 - So instead of emitting just a brcc instruction, we emit an unconditional
   jump to the same place and a branch over it.
 - I added a simple counter to emit/state to number the labels thus introduced.
 - With this commit, I forfeit all right to criticise the hackiness of anyone
   else's code.
 - Though I doubt that will stop me.
pozorvlak: (Default)
Saturday, May 21st, 2011 11:45 am
Thanks to all who helped out with my ssh problem the other day. A few different approaches worked: I'm writing them up here for easy reference.

To recap, the problem was as follows: I want to ssh from my home machine (delight) to my work machine (sentinel) without typing any passwords (and, while I'm at it, to various other work machines, such as the Subversion host). Unfortunately, sentinel isn't visible on the public Internet; first I need to ssh into a gateway machine (rydell) from which sentinel is visible. Oh, and neither sentinel nor rydell allow public-key ssh login; both require you to use the Kerberos authentication protocol, which is explained here.

Brute force and ignorance

My first attempt was to use ssh rydell ssh sentinel (which sshes into rydell, and invokes the command ssh sentinel thereon). This failed with the error "Pseudo-terminal will not be allocated because stdin is not a terminal". Marco Fontani pointed out that the -t switch to ssh allocates a pseudo-terminal, so ssh -t rydell ssh sentinel Does What I Want.

X11 Forwarding

Mat Brown pointed out that I was overthinking it: by using the -X and -C options to ssh (or adding the lines ForwardX11 yes and Compression yes to the relevant stanza of ~/.ssh/config) I could enable compressed forwarding of X11 connections; I could then create new terminal windows by sshing into sentinel once and creating new xterms on there. I already had ForwardX11 set, but didn't know about Compression, so I've enabled that; it seems to help.

Using ProxyCommand and AutoSSH

I added the lines
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r
ServerAliveInterval 60
ServerAliveCountMax 60
Host rydell
    User pvlak1
    HostName rydell.my.employ.er
Host sentinel
    User pvlak1
    ProxyCommand=ssh rydell nohup nc sentinel 22
    HostName sentinel.my.employ.er
Host svn.my.employ.er
    User pvlak1
    ProxyCommand=ssh rydell nohup nc svn 22
to ~/.ssh/config. Then, at the beginning of the day, I set things up with the commands
kinit pvlak1
autossh -f -M 0 -N sentinel
autossh -f -M 0 -N svn.my.employ.er
I can now open a new ssh connection to sentinel in an eyeblink. I needed to install AutoSSH, but this was just an apt-get away.

Two things are going on here. The ProxyCommand lines (suggested by hatfinch, and debugged by [livejournal.com profile] simont) tell ssh how to reach sentinel: in this case, by sshing to rydell and using netcat to open a connection to port 22 on sentinel (on which sentinel's sshd is listening). The HostName line is necessary to stop Kerberos getting confused. The first four lines were suggested by Marco Fontani and Aaron Crane, and allow ssh to multiplex all its connections to sentinel (or any host, come to that) over one channel, eliminating the need for a cryptographic handshake on each new connection and leading to blazingly-fast startup times. To avoid various annoying problems with this setup, you'll need the AutoSSH invocations: Aaron explains why on his blog.

My original problem is solved - hurrah! Now, can anyone explain why resize events weren't being passed through my expect script, and what I could have done about it? :-)
pozorvlak: (Default)
Thursday, May 19th, 2011 02:45 pm
I mostly work from home. However, due to stupid IP licensing requirements, much of my work has to be done on a machine physically located in my employer's building. This is OK, because I can login to said machine over the Internet using ssh.

But! My work machine (sentinel) is not visible over the public Internet. First I have to ssh into a gateway machine (rydell), and then ssh from rydell into sentinel. I like to open a lot of xterms at once, and so I'd like this process to be as simple as possible: ideally, I'd like to click one button and get an xterm sshed to sentinel and cd'ed to the directory containing the code I'm currently working on.

Oh, there's another wrinkle: rydell doesn't allow passwordless login using the normal ssh public key infrastructure. Instead, you have to use Kerberos. Kerberos is an authentication protocol developed at MIT that utilises zzzzzz...

Sorry, drifted off for a minute there. The key point about Kerberos is that you ask a keyserver for a time-limited session key, which is decrypted locally using your password. This session key is then used to establish encrypted connections to other servers in the same authentication realm. You never have to send your password over the network, and you don't have to distribute your public key to every host you ever want to talk to. So, once I've acquired a session key by typing kinit and then giving it my password, I should be able to log in to any machine on my employer's network (including sentinel) without typing my password again that day. Which is brilliant.

Except sentinel still isn't visible over the public Internet. So I still need to ssh into rydell and then ssh into sentinel from there. Both of these logins are now passwordless, but this is still annoying. Here are some things I've tried to improve the situation:

The simplest thing that could possibly work

pozorvlak@delight:~
0 $ ssh rydell ssh sentinel
Pseudo-terminal will not be allocated because stdin is not a terminal.

Automating the double-login with expect

#!/usr/bin/expect -f
set timeout 30
match_max 100000
spawn ssh rydell
send "\r"
expect "0 "             # prompt
send "ssh sentinel\r"
expect "0 "
send "cde\r"            # cd to work directory
interact
This actually works, right until I open a text editor or other ncurses program, and discover that I can't resize my xterm - or rather, that the resize information is not passed on to my programs.

Using sshuttle

sshuttle is a poor man's VPN written by the author of redo. Using the -H option, it allows you to proxy your DNS requests through the remote server's DNS server. So a simple
sshuttle -H -vvr rydell 0/0
at the beginning of the day allows me to ssh directly from my local machine (delight) to sentinel. But! It asks me for my sodding password every single time I do so. This is not what I wanted.

ssh tunnelling

I am too stupid to make sense of the "tunnelling" section of the ssh manpage, but fortunately some Googling turned up this, which describes exactly the case I want.
pozorvlak@delight:~
0 $ ssh -fN -L 9500:sentinel:22 rydell
pozorvlak@delight:~
0 $ ssh -p 9500 pvlak1@localhost
pvlak1@localhost's password: 
Last login: Thu May 19 14:31:32 2011 from rydell.my.employ.er
pvlak1@sentinel 14:34 ~
0 $ 
Yes, my employer is located in Eritrea, what of it? :-) Anyway, you will note that this suffers from the same problem as the previous attempt: I have to type my password for every login. Plus, if the sshuttle manpage is to be believed, tunnelling ssh over ssh is a bad idea performance-wise.


I notice that I am confused. Specifically, I notice that I have the type of confusion that comes from lacking an appropriate conceptual framework for attacking the problem.

Can anyone help?

Edit: Yes! Marco Fontani pointed out that the -t option to ssh allocates a pseudo-terminal, so ssh -t rydell ssh sentinel Does What I Want. Thanks, Marco! And thanks to everyone else who offered suggestions.

Edit 2: hatfinch and [livejournal.com profile] simont (who you may recognise as the author of the ssh client PuTTY) came up with an alternative solution. My .ssh/config now contains the stanza
Host sentinel
    User pvlak1
    ProxyCommand=ssh rydell nohup nc sentinel 22
    HostName sentinel.my.employ.er
This doesn't require me to type a password for every login, does allow me to resize ncurses apps, and feels slightly snappier than ssh -t rydell ssh sentinel, so that's what I'll be using from now on. Thanks very much!
pozorvlak: (Default)
Thursday, March 31st, 2011 12:11 pm
Here's a data-structure I use almost every day. It's an implementation¹ of a queue. You have two stacks, an in-stack and an out-stack. To add a new item, push it onto the in-stack. To remove an item, take the top item off the out-stack; if the out-stack's empty, push all the items on the in-stack onto the out-stack in reverse order and then take the top item.

In C++ )


While Josie looks for an escape route, Haggis accepts the sudden shift in the Earth's gravity with stoic equanimity.

¹ This originally read "an unusual (AFAIK) implementation", but at least three people have now told me that it's the standard implementation used in the purely-functional world. I'm honestly astonished that people think this is a good idea in general, but hey, I guess I've learned something.
pozorvlak: (Default)
Wednesday, March 30th, 2011 10:57 pm
In The Art of Unix Programming, Eric Raymond lists among his basics of the Unix philosophy the "Rule of Generation":
14. Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
He goes into this idea in more detail in chapter 9 of the same book.

I used to believe this was a good idea, and in many situations (here's a great example) it is. But my current work project, which makes heavy use of code generators and custom minilanguages, has been a crash course (sometimes literally) in the downsides. Here's the latest example.

I've recently been merging in some code a colleague wrote about a year ago, just before I started. As you'd expect, with a year's drift this was a non-trivial exercise, but I eventually got all the diffs applied in (I thought) the right places. Protip: if forced to interact with a Subversion repository, use git as your client. It makes your life so much less unpleasant. Anyway, I finished the textual part of the merge, and compiled the code.

Screens-full of error messages. Oh well, that's not so unexpected.

I'm a big fan of Tilton's Law: "solve the first problem". The chances are good that the subsequent problems are just cascading damage from the first problem; no sense in worrying about them until you've fixed that one. Accordingly, I looked only at the first message: "The variable 'state' has not been declared at line 273".

Hang on...

Git checkout colleagues-branch. Make. No errors.

Git checkout merge-branch. Make. Screens-full of errors.

Git checkout colleagues-branch. Grep for a declaration of "state". None visible.

Clearly, there was some piece of voodoo that I'd failed to merge correctly.

I spent days looking through diffs for something, anything, that I'd failed to merge properly that might be relevant to this problem. I failed.

I then spent some serious quality time with the code-generation framework's thousand-page manual, looking for some implicit-declaration mechanism that might explain why "state" was visible in my colleague's branch, but not in mine. I failed.

Finally, I did what I probably should have done in the first place, and took a closer look at the generated code. The error messages that I was seeing referred to the DSL source code rather than the generated C code, because the code-generator emitted #line directives to reset the C compiler's idea of the current file and line; I could therefore find the relevant section of generated code by grepping for the name of the buggy source file in the gen/ directory.

The framework uses code generators for all sorts of things (my favourite generator being the shell script that interprets a DSL to build a Makefile which is used to build another Makefile), but this particular one was used to implement a form of polymorphism: the C snippet you provide is pasted into a honking great switch statement, which switches on some kind of type tag.

I found the relevant bit of generated code, and searched back to the beginning of the function. Yep, "state" was indeed undeclared in that function. And the code generator had left a helpful comment to tell me which hook I needed to use to declare variables or do other setup at the beginning of the function. So that was the thing I'd failed to merge properly!

Git checkout colleagues-branch. Grep for the hook. No results.

And then it hit me.

Like all nontrivial compilers, ours works by making several transformation passes over the code. The first pass parses your textual source-code and spits out a machine-independent tree-structured intermediate representation (IR). There then follow various optimization and analysis passes, which take in IR and return IR. Then the IR is expanded into a machine-specific low-level IR, and finally the low-level IR is emitted as assembly language.

The code that was refusing to compile was part of the expansion stage. But at the time that code was written, the expansion stage didn't exist: we went straight from the high-level IR to assembly. Adding an expansion stage had been my first task on being hired. Had we been using a language that supported polymorphism natively, that wouldn't have been a problem: the code would have been compiled anyway, and the errors would have been spotted; a smart enough compiler would have pointed out that the function was never called. But because we were using a two-stage generate-and-compile build process, we were in trouble. Because there was no expansion stage in my colleague's branch, the broken code was never pasted into a C file, and hence never compiled. My colleague's code was, in fact, full of compile-time errors, but appeared not to be, because the C compiler never got a look at it.

And then I took a closer look at the screensfull of error messages, and saw that I could have worked that out right at the beginning: subsequent error messages referred to OUTFILE, and the output file isn't even open at the expansion stage. Clearly, the code had originally been written to run in the emit phase (when both state and OUTFILE were live), and he'd got half-way through converting it to run at expansion-time before having to abandon it.

Lessons learned:
  1. In a generated-code scenario, do not assume that any particular snippet has been compiled successfully just because the whole codebase builds without errors.
  2. Prefer languages with decent native abstraction mechanisms to code generators.
  3. At least skim the subsequent error messages before dismissing them and working on the first bug: they may provide useful context.
  4. Communication: if I'd enquired more carefully about the condition of the code to be merged I could have saved myself a lot of time.
  5. Bear in mind the possibility that you might not be the guilty one.
  6. Treat ESR's pronouncements with even greater caution in future. Same goes for Kenny Tilton, or any other Great Prognosticator.
Any more?

Edit: two more:
  1. If, despite (2), you find yourself writing a snippet-pasting code generator, give serious thought to providing "this snippet is unused" warnings.
  2. Learn to spot when you're engaged in fruitless activity and need to step back and form a better plan. In my case, the time crawling through diffs was wasted, and I probably could have solved the problem much quicker if I'd rolled up my sleeves and tried to understand the actual code.
Thanks to [livejournal.com profile] gareth_rees and jerf.
pozorvlak: (Default)
Tuesday, March 22nd, 2011 12:24 pm
A couple of days ago I saw a tweet from @TomCadwallader, claiming that
1 sperm has 37.5MB of DNA information in it. That means a normal ejaculation represents a data transfer of 1587GB in about 3 seconds.
That's cute, but there's a more interesting question lurking here: it may represent 1.5TB of data, but how much information is transferred? In other words, how many bits would be required to convey the same "message" if we compressed it as cleverly as possible?

[Also, three seconds? Either Tom's ignoring the time required for the initial protocol handshake, or he's doing it very, very wrong.]

First off, some Wikipedia-reading (it may surprise you to learn that I actually research these posts) brings Tom's figures into question. The raw information content of the human haploid genome is actually more like 770MB, of which 38.7MB is the X chromosome and 14.4MB is the Y chromosome. So a single sperm contains roughly 755MB if X, or 730MB if Y. Sperm density and volume of semen varies wildly, but the adult average is apparently 3.5ml and 300 million sperm per ml. That's around a billion sperm per ejaculation, rather higher than the (1.5TB/37.5MB) =~ 420,000 sperm in Tom's figures! This gives us a total raw information transfer of about 750 petabytes, and a bandwidth of 250PB/sec - comfortably exceeding the bandwidth of a fully-laden 747, let alone Andrew Tanenbaum's speeding truck full of videotapes.

[The relationship between karyotype and apparent sex is more complicated than that, of course: Tom might have Klinefelter's syndrome (XXY) or XYY syndrome, which would make the numbers come out differently. But let's assume he's XY for the sake of simplicity.]

But this data is actually highly redundant, and thus extremely compressible. To see why, we'll need to review some high-school biology. Most human cells are diploid, meaning that they have two copies of each chromosome (one from your mother and one from your father), giving 46 chromosomes in all. Sperm cells, on the other hand, are haploid, meaning that they only have one copy of each chromosome. Sperm cells are formed by meiosis: for each type of chromosome, the new sperm cell is randomly assigned either the mother's copy of that chromosome or the father's copy.

This immediately gives us a huge data reduction. Once we've got a full copy of Tom's genome, all we need to store about each sperm is which chromosomes it's got. That's one bit (mother/father) per chromosome, or 23 bits per sperm. Using our estimate from above, that means we need 23 billion bits, or 2.9 gigabytes. We can also compress our representation of Tom's genome using standard compression algorithms: the entropy of the human genome is apparently 1.7 bits per base pair, as opposed to the 2 bits per base pair we used for our naive encoding, so we can fit his genome into a bit under 1.3GB. Our total information transfer is thus 4.2GB.

... but that would be too easy, wouldn't it?

It turns out that meiosis is more complicated than that: as part of the process, paired chromosomes exchange data - sorry, sections of DNA - in a process known as homologous recombination. I've been unable to determine how frequent this is - any biologists able to help me out here? - but apparently it's vastly more common at recombination hotspots, of which there are known to be over 25,000 in the human genome. If recombination's relatively rare then we could use a space-efficient sparse representation - "exchanges occurred at positions 12, 94, 509, 1337, 2424,...", but if it's common then we're going to have to store one bit per hotspot, to indicate whether or not a recombination occurred there. That's 25,000 bits per sperm, or just over 3kB each: scale it up by a billion, and we're storing 3TB of recombination data.

So, as it turns out, the total bandwidth of Tom's, er, "uploads" is around twice what he thought it was. I suggest that he not use that as a chat-up line, though.
pozorvlak: (Default)
Sunday, February 20th, 2011 09:49 pm
My post about unit-testing Agda code using elisp was posted to the dependent types subreddit (of which I was previously unaware). There, the commenter stevana pointed out that you could achieve the same effect using the following Agda code:
test1 : 2 + 1 == 3
test1 = refl

test2 : 3 + 0 == 3
test2 = refl
If your test fails, you'll get a compile-time error on that line. Aaaaaaaargh. I do wish someone had told me that before I spent a week mucking about with elisp. Well, that's not quite true - I learned a lot about Emacs, I had some fun writing the code, I learned some things about Agda that I might otherwise not have done, I got some code-review from [livejournal.com profile] aaroncrane, and I had a chance to try out using darcs and patch-tag for a collaborative project (the experience, I'm afraid, didn't measure up well against git and GitHub).

What I'd failed to realise, of course, is that the == type constructor describes computable equality: a == b is inhabited if the compiler can compute a proof that a is equal to b. "They both normalise to the same thing" is such a proof, and obviously one that the compiler's smart enough to look for on its own.

Ordinarily, a testcase is better than a compile-time assertion with the same meaning, because you can attach a debugger to a failing test and investigate exactly how your code is broken. This doesn't mean that types are stupid - types draw their power from their ability to summarise many different testcases. But in this case, I don't think there's much reason to prefer our approach to stevena's: the slight increase in concision that agda-test allows is balanced out by its inability to coerce both expressions to the same type, which often means the user has to type a lot more code. Stevena's approach does not suffer from this problem.

This also makes me more positive about dependent typing: if simple testcases can be incorporated into your type-system so easily, then maybe the "types are composable tests" slogan has some truth to it.

But seriously, guys, macros.
pozorvlak: (Default)
Friday, February 4th, 2011 02:11 pm
On Monday I went to the first day of Conor McBride's course Introduction to Dependently-Typed programming in Agda. "What's dependently-typed programming?" you ask. Well, when a compiler type-checks your program, it's actually (in a sense which can be made precise) proving theorems about your code. Assignments of types to expressions correspond to proofs of statements in a formal logical language; the precise logical language in which these statements are expressed is determined by your type system (union types correspond to "or", function types correspond to "if... then", that kind of thing). This correspondence goes by the fancy name of "the Curry-Howard isomorphism" in functional programming and type-theory circles. In traditional statically-typed languages these theorems are mostly pretty uninteresting, but by extending your type system so it corresponds to a richer logical language you can start to state and prove some more interesting theorems by expressing them as types, guaranteeing deep properties of your program statically. This is the idea behind dependent typing. A nice corollary of their approach is that types in dependently-typed languages (such as Agda, the language of the course) can be parametrised by values (and not just by other types, as in Haskell), so you can play many of the same type-level metaprogramming games as in C++ and Ada, but in a hopefully less crack-fuelled way. I spent a bit of time last year playing around with Edwin Brady's dependently-typed systems language Idris, but found the dependent-typing paradigm hard to wrap my head around. So I was very pleased when Conor's course was announced.

The course is 50% lab-based, and in these lab sessions I realised something important: fancy type-system or no, I need to test my code, particularly when I'm working in such an unfamiliar language. Dependent typing may be all about correctness by construction, but I'm not (yet?) enlightened enough to work that way - I need to see what results my code actually produces. I asked Conor if there were any way to evaluate individual Agda expressions, and he pointed me at the "Evaluate term to normal form" command in Emacs' Agda mode (which revealed that I had, indeed, managed to produce several incorrect but well-typed programs). Now, that's better than nothing, but it's clearly inadequate as a testing system - you have to type the expression to evaluate every time, and check the result by eye. I asked Conor if Agda had a more extensive unit-testing framework, and he replied "I'm not aware of such a thing. The culture is more 'correctness by construction'. Testing is still valuable."

So I wrote one.

I've written - or at least hacked on - a test framework of some sort at almost every programming job I've had (though these days I try to lean on Test::More and friends as much as possible). It gets hella tedious. This one was a bit different, though. One message that came out of Conor's lectures was that the correct way to represent equality in dependently-typed languages is somewhat controversial; as a beginner, I didn't want to dip a toe into these dangerous waters until I had a clearer idea of the issues. But the basic operation of any testing system is "running THIS should yield THAT". Fortunately, there was a way I could punt on the problem. Since Agda development seems to be closely tied to the interactive Emacs mode, I could deliver my system as a set of Emacs commands; the actual testing could be done by normalising the expression under consideration and testing the normalised form for string-equality with the expected answer.

This was less easy than I'd expected; it turns out that agda-mode commands work by sending commands to a slave GHCi process, which generates elisp to insert the results into the appropriate buffer. I'm sure that agda-mode's authors had some rationale for this rather bizarre design, but it makes agda-mode a lot harder to build on than it could be. However (in close collaboration with Aaron Crane, who both contributed code directly and guided me through the gargantuan Emacs API like a Virgil to my Dante) I eventually succeeded. There are two ways to get our code:
darcs get http://patch-tag.com/r/pozorvlak/agda-test
or
git clone git://github.com/pozorvlak/agda-test.git
Then load the file agda-test.el in your .emacs in the usual way. Having done so, you can add tests to your Agda file by adding comments of the form
{- test TESTNAME: ACTUAL is EXPECTED; -}
For instance,
{- test 2+1: (suc (suc zero)) +N (suc zero) is (suc (suc (suc zero)));
   test 3+0: (suc (suc (suc (zero)))) +N zero is (suc (suc zero)); -}
When you then invoke the function agda2-test-all (via C-u C-c C-v for "verify"), you should be presented with a new buffer called *Agda test results*, containing the text
1..2
ok 1 - 2+1
not ok 2 - 3+0
    got 3
    expected 2
[ Except, er, I get "expected _175" for that last test instead. I don't think that's a bug in my elisp code, because I get the same result when I evaluate that expression manually with C-c C-n. Halp pls?]

You should recognise the output as the Test Anything Protocol; it should be easy to plug in existing tools for aggregating and displaying TAP results.

There are a lot of things I'd like to do in the future:
  • Add commands to only run a single test, or just the tests in a given comment block, or a user-specified ad-hoc test group.
  • Highlight failing tests in the Agda buffer.
  • Allow the user to specify a TAP aggregator for the results.
  • Do something about the case where the test expressions don't compile cleanly.
If you'd like to send patches (which would be very welcome!), the darcs repository is currently considered the master one, so please submit patches there. I'm guessing that most Agda people are darcs users, and the slight additional friction of using darcs as my VCS is noise next to the friction of using Emacs as my editor :-) The git repository is currently just a mirror of the darcs one, but it would be easy to switch over to having it as the master if enough potential contributors would prefer git to darcs. Things might get confusing if I get patches submitted in both places, though :-)