- I can't safely modify, or even understand, this code, because it has no tests.
- I can't test it without modifying it.
Check out this book on Goodreads: https://www.goodreads.com/book/show/44919.Working_Effectively_with_Legacy_Code

Last year I made three New Year's Resolutions:
Number 2 was an obvious success: I finished the Edinburgh Marathon in 4:24:04, and raised nearly £900 for the Against Malaria Foundation. I'd been hoping to get a slightly faster time than that, but I lost several weeks of training to a chest infection near to the end of my training programme, so in the end I was very happy to finish under 4:30. The actual running was... mostly Type II fun, but also much less miserable than many of my training runs, even at mile 21 when I realised that literally everything below my navel hurt. Huge thanks to everyone who sponsored me!
Number 3 was an equally obvious failure. My climbing partner and I picked out an unclimbed mountain in Kyrgyzstan and got a lot of the logistics sorted, but then he moved house and started a new job a month before we were due to get on a plane to Bishkek. With only a few weeks to go and no plane tickets or insurance bought yet (and them both being much more expensive than we'd expected - we'd checked prices months earlier, but forgot how steeply costs rise as time goes on), we regretfully pulled the plug. We're planning to try again in 2016 - let's hope all the good lines don't get nabbed by Johnny-come-lately Guardian readers.
Number 1 was a partial success. I tried a number of suggestions from friends who appear to have their financial shit more together than me (not hard), but couldn't get any of them to stick. I was diagnosed with ADHD at the end of 2014; I don't want to use that as an excuse, but it does mean that some things that come easily to most people are genuinely difficult for me - and financial mismanagement is apparently very common among people with ADHD. The flip-side, though, is that I have a license to do crazy or unusual things if they help me be effective, because I have an actual medical condition.
I've now set up the following system:
I'll have to fine-tune the amount left in Account 1 with practice, but this system should ensure that bills get paid, I can easily see how much money I have left to spend for the month, and very little further thought or effort on my part is required.
While I was in there, I took the opportunity to set up a recurring donation to the Against Malaria Foundation for a few percent of my net salary - less than the 10% required to call yourself an Official Good Person by the Effective Altruism movement, but I figure I can work up to it.
It's too early to say whether the system will work out, but setting it up has already been a beneficial exercise - before, I had seven accounts with five different providers, most of them expired and paying almost zero interest (in one file, I found seven years' worth of letters saying "Your investment has expired and is now paying 0.1% gross interest, please let us know what you want us to do with it.") I now have only the three accounts described above, from two different providers, so it should be much easier to keep track of my overall financial position. Interest rates currently suck in general, but Accounts 2 and 3 at least pay a bit.
I've also started a new job that pays more, and wormwood_pearl's writing is starting to bring in some money. We're trying not to go mad and spend our newfound money several times over, but we're looking to start replacing some broken kit over the next few months rather than endlessly patching things up.
What else has happened to us?
I had a very unsuccessful winter climbing season last year; I was ill a lot from the stress of marathon training, and when I wasn't ill the weather was terrible. I had a couple of good sessions at the Glasgow ice-climbing wall, but only managed one actual route. Fortunately, it was the classic Taxus on Beinn an Dothaidh, which I'd been wanting to tick for a while. I also passed the half-way mark on the Munros on a beautiful crisp winter day in Glencoe.
One by one, my former research group's PhD students finished, passed their vivas, submitted their corrections, and went off, hopefully, to glittering academic careers or untold riches in Silicon Valley. Good luck to them all.
In June, I did the training for a Mountain Leadership award, the UK's introductory qualification for leading groups into the hills. Most of the others on the course were much fitter than me and more competent navigators, but the instructor said I did OK. To complete the award, I'll need to log some more Quality Mountain Days and do a week-long assessment.
In July, we went to Mat Brown's wedding in Norfolk, and caught up with some friends we hadn't seen IRL for far too long. Unlike last year, when it felt like we were going to a wedding almost every weekend, we only went to one wedding this year; I'm glad it was such a good one. Also, it was in a field with camping available, which really helped to keep our costs down.
In July, I started a strength-training cycle. I've spent years thinking that my physical peak was during my teens, when I was rowing competitively (albeit badly) and training 15-20 hours a week, so I was surprised to learn that I was able to lift much more now than I could then - 120kg squats versus around 90kg (not counting the 20kg of body weight I've gained since then). Over the next few weeks, I was able to gain a bit more strength, and by the end I could squat 130kg. I also remembered how much I enjoy weight training - so much less miserable than cardio.
In August, we played host to a few friends for the Edinburgh Fringe, and saw some great shows, of which my favourite was probably Jurassic Park.
In September, we went to Amsterdam with friends for a long weekend, saw priceless art and took a canal tour; then I got back, turned around within a day and went north for a long-awaited hiking trip to Knoydart with my grad-school room-mate. There are two ways to get to Knoydart: either you can take the West Highland Line right to the end at Mallaig, then take the ferry, or you can get off at Glenfinnan (best known for the viaduct used in the Harry Potter films) and walk North for three days, sleeping in unheated huts known as bothies. We did the latter, only it took us six days because we bagged all the Munros en route. I'm very glad we did so. The weather was cold but otherwise kind to us, the insects were evil biting horrors from Hell, and the starfields were amazing. It wasn't Kyrgyzstan, but it was the best fallback Europe had to offer.
In October, I started a new job at Red Hat, working on the OpenStack project, which is an open-source datacenter management system. It's a huge, intimidating codebase, and I'm taking longer than I'd like to find my feet, but I like my team and I'm slowly starting to get my head around it.
That's about it, and it's five minutes to the bells - Happy New Year, and all the best for 2016!
Over the seven years or so I've been using Git, there have been a number of moments where I've felt like I've really levelled up in my understanding or usage of Git. Here, in the hope of accelerating other people's learning curves, are as many as I can remember. I'd like to thank Joe Halliwell for introducing me to Git and helping me over the initial hurdles, and Aaron Crane for many helpful and enlightening discussions over the years - especially the ones in which he cleared up some of my many misunderstandings.
Even if all you're doing is commit/push/pull, you're manipulating the history graph. Don't try to imagine it in your mind - get the computer to show it to you! That way you can see the effect your actions have, tightening your feedback loop and improving your learning rate. It's also a lot easier to debug problems when you can see what's going on. I started out using gitk --all
to view history; now I mostly use git lg
, which is an alias for log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%Creset' --abbrev-commit --date=relative
, but the effect is much the same.
The really important lessons to internalise at this stage are
If you understand that, then congratulations! You now understand 90% of what's important about Git. Just keep that core model in mind, and it should be possible to reason about the rest.
Relatedly: you'll probably want to graduate to the command-line eventually, but don't be ashamed to use a graphical client for any remotely unfamiliar task.
You know those weird c56ab7f
identifiers Git uses for commits? Those are actually serving a very important role. Mark Jason Dominus has written a great explanation, and I also suggest having a poke around in the .git/objects
directory of one of your repos. MJD talks about "blobs" and "trees" which are also identified with hashes, but you don't actually have to think about them much in day-to-day Git usage: commits are the most important objects.
git rebase
This shouldn't really qualify as a GLUE - my very first experience with Git was using the git-svn plugin, which forces you to rebase at all times. However, even if you're not interoperating with SVN, you should give rebase a go. What does it do, you ask? Well, the clue's in the name: it takes the sequence of commits you specify, and moves them onto a new base: hence "re-base". See also the MJD talk above, which explains rebasing in more detail.
Note to GUI authors: the obvious UI for this is selecting a load of commits and dragging-and-dropping them to somewhere else in the history graph. Sadly, I don't know of any GUIs that actually do this. Edit: MJD agrees, and is seeking collaborators for work on such a GUI.
Rebasing occasionally gets stuck when it encounters a conflict. The thing to realise here is that the error messages are really good. Keep calm, read the instructions, then follow them, and everything should turn out OK (but if it doesn't, feel free to skip to the next-but-two GLUE).
git mergetool
git mergetool
allows you to open a graphical diff/merge tool of your choice for handling conflicts. Don't use it. Git's internal merge has complete knowledge of your code's history, which enables it to do a better job of resolving conflicts than any external merge tool I'm aware of. If you use a third-party tool, expect to waste a lot of time manually resolving "conflicts" that the computer could have handled itself; the internal merge algorithm will handle these just fine, and present you with only the tricky cases.
git rebase --interactive
I thought this was going to be hard and scary, but it's actually really easy and pleasant to use, I promise! The name is terrible - while git rebase --interactive
can do a normal rebase as part of its intended work, this is usually a bad idea. What it's actually for is rewriting history. Have you made two commits that should really be squashed into one? Git rebase --interactive! Want to re-order some commits? Git rebase --interactive! Delete some entirely? Git rebase --interactive! Split them apart into smaller commits? Git rebase --interactive! The interface to this is extremely friendly by Git CLI standards: specify the commit before the first one you want to rewrite, and Git will open an editor window with a list of the commits-to-rewrite. Follow the instructions. If that's not clear enough, read this post.
This should also be represented in a GUI by dragging-and-dropping, but I don't know of any clients that do this.
git reflog
Git stores a list of every commit you've had checked out. Look at the output of git reflog
after a few actions: this allows you to see what's happened, and is thus valuable for learning. More usefully, it also allows you to undo any history rewriting. Here's the thing: Git doesn't actually rewrite history. It writes new history, and hides the old version (which is eventually garbage-collected after a month). But you can still get the old version back if you have a way to refer to it, and that's what git reflog
gives you. Knowing that you can easily undo any mistakes allows you to be a lot bolder in your experiments.
Here's a thing I only recently learned: Git also keeps per-branch reflogs, which can be accessed using git reflog $branchname
. This saves you some log-grovelling. Sadly, these only contain local changes: history-rewritings done by other people won't show up.
git-rebase--interactive
Interactive rebase is implemented as a shell script, which on my system lives at /usr/lib/git-core/git-rebase--interactive
. It's pretty horrible code, but it's eye-opening to see how the various transformations are implemented at the low-level "plumbing" layer.
I actually did this as part of a (failed) project to implement the Darcs merge algorithm on top of Git. I still think this would be a good idea, if anyone wants to have a go. See also this related project, which AFAICT has some of the same advantages and better asymptotic complexity than the Darcs merge algorithm.
You don't actually need to know this, but it's IMHO pretty elegant. Here's a good description.
HEAD
actually meansI learned this from reading two great blog posts by Federico Mena Quintero: part 1, part 2. Those posts also cleared up a lot of my confusion about how remotes and remote-tracking branches work.
The manual is pretty good here. Once you know how refspec notation works, you'll notice it's used all over and a lot of things will click into place.
git reset
actually doesI'd been using git reset
in stereotyped ways since the beginning: for instance, I knew that git reset --hard HEAD
meant "throw away all uncommitted changes in my working directory" and git reset --hard [commit hash obtained from reflog]
meant "throw away a broken attempt at history-rewriting". But it turns out that this is not the core of reset
. Again, MJD has written a great explanation, but here's the tl;dr:
Once you understand this, git reset
becomes part of your toolkit, something you can apply to new problems. Is a branch pointing to the wrong commit? There's a command that does exactly what you need, and it's git reset
. Here's yet another MJD post, in which he explains a nonobvious usage for git reset
which makes perfect sense in light of the above. Here's another one: suppose someone has rewritten history in a remote branch. If you do git pull
you'll create a merge commit between your idea of the current commit and upstream's idea; if you later git push
it you'll have created a messy history and people will be annoyed with you. No problem! git fetch upstream $branch; git checkout $branch; git reset --hard upstream/$branch
.
git add -p
and friendsLate Entry! The git add -p
command, which interactively adds changes to the index (the -p
is short for --patch
), wasn't much of a surprise to me because I was used to darcs record
; however, it seems to be a surprise to many people, so (at Joe Halliwell's suggestion) it deserves a mention here. And only tonight Aaron informed me of the existence of its cousins git reset -p
, git commit -p
and git checkout -p
, which are totally going in my toolkit.
I hope that helped someone! Now, over to those more expert than me - what should be my next enlightenment experience?
redo-ifchange $2.mli ocamlc $2.mli
Note: these do-files will not actually work, because redo insists that you write your output to a temporary file called $3 so it can atomically rename the newly-built file into place, and ocamlc is equally insistent that it knows better than you what its output files should be called. However, this annoying interaction of their limitations is irrelevant to the dependency-checking algorithm, so I'll pretend that they do work :-) I'll try to construct a workaround and post it on GitHub. Update: I have now done so!redo-ifchange $2.cmi $2.ml redo-ifchange `ocamldep $2.mli $2.ml` ocamlc $2.ml
redo-ifchange X.cmo
again, redo will check its database of dependencies and observe that X.cmo depends transitively on X.cmi, X.ml and X.mli but that none of them have changed; hence, it will correctly do nothing.redo-ifchange X.cmo
. Redo will again check its database of dependencies and note that X.ml has changed, so it must re-run default.cmo.do. First it notes that X.cmo depends on X.cmi and X.ml: it checks its database and sees that X.cmi depends on X.mli, which hasn't changed, so it leaves X.cmi alone. Next it re-runs ocamldep X.mli X.ml
and hands the output to redo-ifchange
: this tells redo that X.cmo now depends on Y.cmi. Y.cmi doesn't exist yet, so it builds it using the rules in default.cmi.do. Finally it compiles X.ml into X.cmo..hgrc
:[ui] username = Pozorvlak <pozorvlak@example.com> merge = internal:merge [pager] pager = LESS='FSRX' less [extensions] rebase = record = histedit = ~/usr/etc/hg/hg_histedit.py fetch = shelve = ~/usr/etc/hg/hgshelve.py pager = mq = color =
I've been running benchmarks again. The basic workflow is
Suppose I want to benchmark three different simulators with two different compilers for three iteration counts. That's 18 configurations. Now note that the problem found in stage 5 and fixed in stage 6 will probably not be unique to one configuration - if it affects the invocation of one of the compilers then I'll want to propagate that change to nine configurations, for instance. If it affects the benchmarks themselves or the benchmark-invocation harness, it will need to be propagated to all of them. Sounds like this is a job for version control, right? And, of course, I've been using version control to help me with this; immediately after step 1 I check everything into Git, and then use git fetch
and git merge
to move changes between repositories. But this is still unpleasantly tedious and manual. For my last paper, I was comparing two different simulators with three iteration counts, and I organised this into three checkouts (x1, x10, x100), each with two branches (simulator1
, simulator2
). If I discovered a problem affecting simulator1
, I'd fix it in, say, x1's simulator1
branch, then git pull
the change into x10 and x100. When I discovered a problem affecting every configuration, I checked out the root commit of x1, fixed the bug in a new branch, then git merge
d that branch with the simulator1
and simulator2
branches, then git pull
ed those merges into x10 and x100.
Keeping track of what I'd done and what I needed to do was frankly too cognitively demanding, and I was constantly bedevilled by the sense that there had to be a Better Way. I asked about this on Twitter, and Ganesh Sittampalam suggested "use Darcs" - and you know, I think he's right, Darcs' "bag of commuting patches" model is a better fit to what I'm trying to do than Git's "DAG of snapshots" model. The obvious way to handle this in Darcs would be to have six base repositories, called "everything", "x1", "x10", "x100", "simulator1" and "simulator2"; and six working repositories, called "simulator2_x1", "simulator2_x10", "simulator2_x100", "simulator2_x1", "simulator2_x10" and "simulator2_x100". Then set up update
scripts in each working repository, containing, for instance
and every time you fix a bug, run#!/bin/sh darcs pull ../base/everything darcs pull ../base/simulator1 darcs pull ../base/x10
for i in working/*; do $i/update; done
. But! It is extremely useful to be able to commit the output logs associated with a particular state of the build scripts, so you can say "wait, what went wrong when I used the -static
flag? Oh yeah, that". I don't think Darcs handles that very well - or at least, it's not easy to retrieve any particular state of a Darcs repo. Git is great for that, but whenever I think about duplicating the setup described above in Git my mind recoils in horror before I can think through the details. Perhaps it shouldn't - would this work? Is there a Better Way that I'm not seeing?
strace
or some equivalent.stat
is slow on modern filesystems.#include
.[Exercise for the reader: which build tools make which assumptions, and which compilers violate them?]