2015-05-13

Over the seven years or so I've been using Git, there have been a number of moments where I've felt like I've really levelled up in my understanding or usage of Git. Here, in the hope of accelerating other people's learning curves, are as many as I can remember. I'd like to thank Joe Halliwell for introducing me to Git and helping me over the initial hurdles, and Aaron Crane for many helpful and enlightening discussions over the years - especially the ones in which he cleared up some of my many misunderstandings.

Learning to use a history viewer

Even if all you're doing is commit/push/pull, you're manipulating the history graph. Don't try to imagine it in your mind - get the computer to show it to you! That way you can see the effect your actions have, tightening your feedback loop and improving your learning rate. It's also a lot easier to debug problems when you can see what's going on. I started out using gitk --all to view history; now I mostly use git lg, which is an alias for log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%Creset' --abbrev-commit --date=relative, but the effect is much the same.

The really important lessons to internalise at this stage are

Git maintains your history as a graph of snapshots (you'll hear people using the term DAG, for "directed acyclic graph").
Branches and tags are just pointers into this graph.

If you understand that, then congratulations! You now understand 90% of what's important about Git. Just keep that core model in mind, and it should be possible to reason about the rest.

Relatedly: you'll probably want to graduate to the command-line eventually, but don't be ashamed to use a graphical client for any remotely unfamiliar task.

Understanding hash-based identifiers

You know those weird c56ab7f identifiers Git uses for commits? Those are actually serving a very important role. Mark Jason Dominus has written a great explanation, and I also suggest having a poke around in the .git/objects directory of one of your repos. MJD talks about "blobs" and "trees" which are also identified with hashes, but you don't actually have to think about them much in day-to-day Git usage: commits are the most important objects.

Learning to use `git rebase`

This shouldn't really qualify as a GLUE - my very first experience with Git was using the git-svn plugin, which forces you to rebase at all times. However, even if you're not interoperating with SVN, you should give rebase a go. What does it do, you ask? Well, the clue's in the name: it takes the sequence of commits you specify, and moves them onto a new base: hence "re-base". See also the MJD talk above, which explains rebasing in more detail.

Note to GUI authors: the obvious UI for this is selecting a load of commits and dragging-and-dropping them to somewhere else in the history graph. Sadly, I don't know of any GUIs that actually do this. Edit: MJD agrees, and is seeking collaborators for work on such a GUI.

Rebasing occasionally gets stuck when it encounters a conflict. The thing to realise here is that the error messages are really good. Keep calm, read the instructions, then follow them, and everything should turn out OK (but if it doesn't, feel free to skip to the next-but-two GLUE).

Giving up on `git mergetool`

git mergetool allows you to open a graphical diff/merge tool of your choice for handling conflicts. Don't use it. Git's internal merge has complete knowledge of your code's history, which enables it to do a better job of resolving conflicts than any external merge tool I'm aware of. If you use a third-party tool, expect to waste a lot of time manually resolving "conflicts" that the computer could have handled itself; the internal merge algorithm will handle these just fine, and present you with only the tricky cases.

Learning to use `git rebase --interactive`

I thought this was going to be hard and scary, but it's actually really easy and pleasant to use, I promise! The name is terrible - while git rebase --interactive can do a normal rebase as part of its intended work, this is usually a bad idea. What it's actually for is rewriting history. Have you made two commits that should really be squashed into one? Git rebase --interactive! Want to re-order some commits? Git rebase --interactive! Delete some entirely? Git rebase --interactive! Split them apart into smaller commits? Git rebase --interactive! The interface to this is extremely friendly by Git CLI standards: specify the commit before the first one you want to rewrite, and Git will open an editor window with a list of the commits-to-rewrite. Follow the instructions. If that's not clear enough, read this post.

This should also be represented in a GUI by dragging-and-dropping, but I don't know of any clients that do this.

Learning about `git reflog`

Git stores a list of every commit you've had checked out. Look at the output of git reflog after a few actions: this allows you to see what's happened, and is thus valuable for learning. More usefully, it also allows you to undo any history rewriting. Here's the thing: Git doesn't actually rewrite history. It writes new history, and hides the old version (which is eventually garbage-collected after a month). But you can still get the old version back if you have a way to refer to it, and that's what git reflog gives you. Knowing that you can easily undo any mistakes allows you to be a lot bolder in your experiments.

Here's a thing I only recently learned: Git also keeps per-branch reflogs, which can be accessed using git reflog $branchname. This saves you some log-grovelling. Sadly, these only contain local changes: history-rewritings done by other people won't show up.

Reading the code of `git-rebase--interactive`

Interactive rebase is implemented as a shell script, which on my system lives at /usr/lib/git-core/git-rebase--interactive. It's pretty horrible code, but it's eye-opening to see how the various transformations are implemented at the low-level "plumbing" layer.

I actually did this as part of a (failed) project to implement the Darcs merge algorithm on top of Git. I still think this would be a good idea, if anyone wants to have a go. See also this related project, which AFAICT has some of the same advantages and better asymptotic complexity than the Darcs merge algorithm.

Learning how the recursive merge algorithm works

You don't actually need to know this, but it's IMHO pretty elegant. Here's a good description.

Learning what `HEAD` actually means

I learned this from reading two great blog posts by Federico Mena Quintero: part 1, part 2. Those posts also cleared up a lot of my confusion about how remotes and remote-tracking branches work.

Learning to read refspec notation

The manual is pretty good here. Once you know how refspec notation works, you'll notice it's used all over and a lot of things will click into place.

Learning what `git reset` actually does

I'd been using git reset in stereotyped ways since the beginning: for instance, I knew that git reset --hard HEAD meant "throw away all uncommitted changes in my working directory" and git reset --hard [commit hash obtained from reflog] meant "throw away a broken attempt at history-rewriting". But it turns out that this is not the core of reset. Again, MJD has written a great explanation, but here's the tl;dr:

It points the HEAD ref (aka the current branch - remember, branches are just pointers) at a new 'target' commit, if you specified one.
Then it copies the tree of the HEAD commit to the index, unless you said --soft.
Finally, it copies the contents of the index to the working tree, if you said --hard.

Once you understand this, git reset becomes part of your toolkit, something you can apply to new problems. Is a branch pointing to the wrong commit? There's a command that does exactly what you need, and it's git reset. Here's yet another MJD post, in which he explains a nonobvious usage for git reset which makes perfect sense in light of the above. Here's another one: suppose someone has rewritten history in a remote branch. If you do git pull you'll create a merge commit between your idea of the current commit and upstream's idea; if you later git push it you'll have created a messy history and people will be annoyed with you. No problem! git fetch upstream $branch; git checkout $branch; git reset --hard upstream/$branch.

`git add -p` and friends

Late Entry! The git add -p command, which interactively adds changes to the index (the -p is short for --patch), wasn't much of a surprise to me because I was used to darcs record; however, it seems to be a surprise to many people, so (at Joe Halliwell's suggestion) it deserves a mention here. And only tonight Aaron informed me of the existence of its cousins git reset -p, git commit -p and git checkout -p, which are totally going in my toolkit.

Whew!

I hope that helped someone! Now, over to those more expert than me - what should be my next enlightenment experience?

2015-05-13

Git Levelling-Up Experiences

Learning to use a history viewer

Understanding hash-based identifiers

Learning to use git rebase

Giving up on git mergetool

Learning to use git rebase --interactive

Learning about git reflog

Reading the code of git-rebase--interactive