October 2017

151617 18192021

Style Credit

Expand Cut Tags

No cut tags
pozorvlak: (Default)
Monday, April 20th, 2015 08:29 pm

Calculating the dependency graph of your build ahead-of-time can be fiendishly difficult, or even impossible. Redo brings two, I think brilliant, insights to bear on this problem.
  1. If you have to build the target in the course of calculating its dependencies, that's totally OK, because that's what you really wanted to do in the first place.
  2. You don't actually have to know the entire build graph when you start building; you only need to know enough dependencies such that
    • if a given target T needs to be rebuilt, at least one of the dependencies you know about for T will have been affected;
    • in the course of building T, you will discover the remaining dependencies of T and rebuild any stale ones.
Let's try to write do-files for building OCaml modules.

In default.cmi.do:
redo-ifchange $2.mli
ocamlc $2.mli

In default.cmo.do:
redo-ifchange $2.cmi $2.ml
redo-ifchange `ocamldep $2.mli $2.ml`
ocamlc $2.ml
Note: these do-files will not actually work, because redo insists that you write your output to a temporary file called $3 so it can atomically rename the newly-built file into place, and ocamlc is equally insistent that it knows better than you what its output files should be called. However, this annoying interaction of their limitations is irrelevant to the dependency-checking algorithm, so I'll pretend that they do work :-) I'll try to construct a workaround and post it on GitHub. Update: I have now done so!

The redo-ifchange command says "if you know how to build my arguments, then build them; if any of them changes in the future, then the target built by this script will be out-of-date". So to build X.cmi, we observe that it depends on X.mli (which probably won't need building), and then build it. To build X.cmo, we observe that it depends on both X.ml and X.cmi (which will be rebuilt if need be). Then we invoke ocamldep to get a list of other files imported by X.ml, build those if any are out of date, and finally invoke ocamlc on X.ml to produce X.cmo.

Let's see how this plays out in the following scenario:
  1. We build X.cmo.
  2. We try to build it again.
  3. We edit X.ml, adding a new dependency Y.
  4. We rebuild X.cmo.
First, redo runs default.cmo.do, discovers that X.cmo depends on X.ml and X.cmi, recursively builds X.cmi (determining that it depends on X.mli), and finally compiles X.ml, producing X.cmo.

When we run redo-ifchange X.cmo again, redo will check its database of dependencies and observe that X.cmo depends transitively on X.cmi, X.ml and X.mli but that none of them have changed; hence, it will correctly do nothing.

Then we add the dependency, and run redo-ifchange X.cmo. Redo will again check its database of dependencies and note that X.ml has changed, so it must re-run default.cmo.do. First it notes that X.cmo depends on X.cmi and X.ml: it checks its database and sees that X.cmi depends on X.mli, which hasn't changed, so it leaves X.cmi alone. Next it re-runs ocamldep X.mli X.ml and hands the output to redo-ifchange: this tells redo that X.cmo now depends on Y.cmi. Y.cmi doesn't exist yet, so it builds it using the rules in default.cmi.do. Finally it compiles X.ml into X.cmo.

This system should work provided that all your dependencies live within the filesystem, or can be brought within it; however, if this is not the case then you probably have bigger problems :-)
pozorvlak: (Hal)
Thursday, December 6th, 2012 09:45 pm
Inspired by Falsehoods Programmers Believe About Names, Falsehoods Programmers Believe About Time, and far, far too much time spent fighting autotools. Thanks to Aaron Crane, [livejournal.com profile] totherme and [livejournal.com profile] zeecat for their comments on earlier versions.

It is accepted by all decent people that Make sucks and needs to die, and that autotools needs to be shot, decapitated, staked through the heart and finally buried at a crossroads at midnight in a coffin full of millet. Hence, there are approximately a million and seven tools that aim to replace Make and/or autotools. Unfortunately, all of the Make-replacements I am aware of copy one or more of Make's mistakes, and many of them make new and exciting mistakes of their own.

I want to see an end to Make in my lifetime. As a service to the Make-replacement community, therefore, I present the following list of tempting but incorrect assumptions various build tools make about building software.

All of the following are wrong:
  • Build graphs are trees.
  • Build graphs are acyclic.
  • Every build step updates at most one file.
  • Every build step updates at least one file.
  • Compilers will always modify the timestamps on every file they are expected to output.
  • It's possible to tell the compiler which file to write its output to.
  • It's possible to tell the compiler which directory to write its output to.
  • It's possible to predict in advance which files the compiler will update.
  • It's possible to narrow down the set of possibly-updated files to a small hand-enumerated set.
  • It's possible to determine the dependencies of a target without building it.
  • Targets do not depend on the rules used to build them.
  • Targets depend on every rule in the whole build system.
  • Detecting changes via file hashes is always the right thing.
  • Detecting changes via file hashes is never the right thing.
  • Nobody will ever want to rebuild a subset of the available dirty targets.
  • People will only want to build software on Linux.
  • People will only want to build software on a Unix derivative.
  • Nobody will want to build software on Windows.
  • People will only want to build software on Windows.
    (Thanks to David MacIver for spotting this omission.)
  • Nobody will want to build on a system without strace or some equivalent.
  • stat is slow on modern filesystems.
  • Non-experts can reliably write portable shell script.
  • Your build tool is a great opportunity to invent a whole new language.
  • Said language does not need to be a full-featured programming language.
  • In particular, said language does not need a module system more sophisticated than #include.
  • Said language should be based on textual expansion.
  • Adding an Nth layer of textual expansion will fix the problems of the preceding N-1 layers.
  • Single-character magic variables are a good idea in a language that most programmers will rarely use.
  • System libraries and globally-installed tools never change.
  • Version numbers of system libraries and globally-installed tools only ever increase.
  • It's totally OK to spend over four hours calculating how much of a 25-minute build you should do.
  • All the code you will ever need to compile is written in precisely one language.
  • Everything lives in a single repository.
  • Files only ever get updated with timestamps by a single machine.
  • Version control systems will always update the timestamp on a file.
  • Version control systems will never update the timestamp on a file.
  • Version control systems will never change the time to one earlier than the previous timestamp.
  • Programmers don't want a system for writing build scripts; they want a system for writing systems that write build scripts.

[Exercise for the reader: which build tools make which assumptions, and which compilers violate them?]

pozorvlak: (Default)
Saturday, July 2nd, 2011 06:37 pm
I'm currently running a lot of benchmarks in my day job, in the hope of perhaps collecting some useful data in time for an upcoming paper submission deadline - this is the "science" part of "computer science". Since getting a given benchmark suite built and running is often needlessly complex and tedious, one of my colleagues has written an abstraction layer in the form of a load of Makefiles. By issuing commands like "make build-eembc2", "make run-utdsp" or "make distclean-dspstone" you can issue the correct command (build/run/distclean) to whichever benchmark suite you care about. The lists of individual benchmarks are contained in .mk files, so you can strip out any particular benchmark you're not interested in.

I want to use benchmark runs as part of the fitness function for a genetic algorithm, so it's important that it run fast, and simulating another processor (as we're doing) is inherently a slow business. Fortunately, benchmark suites consist of lots of small programs, which can be run in parallel if you don't care about measuring wallclock seconds. And make already has support for parallel builds, using the -j option.

But it's always worth measuring these things, so I copied the benchmark code up onto our multi-core number crunching machine, and did two runs-from-clean with and without the -j flag. No speedup. Checking top, I found that only one copy of the simulator or compiler was ever running at a time. What the hell? Time to look at the code:
TARGETS=build run collect clean distclean

%-eembc2: eembc-2.0
        @for dir in $(BMARKS_EEMBC2) ; do \
          if test -d eembc-2.0/$$dir ; then \
            ${MAKE} -C eembc-2.0/$$dir $* ; \
          fi; \
Oh God. Dear colleague, you appear to have taken a DSL explicitly designed to provide parallel tracking of dependencies, and then deliberately thrown that parallelism away. What were you thinking?¹ But it turns out that Dominus' Razor applies here, because getting the desired effect without sacrificing parallelism is actually remarkably hard... )

Doing it in redo instead )

Time to start teaching my colleagues about redo? I think it might be...

¹ He's also using recursive make, which means we're doing too much work if there's much code shared between different benchmarks. But since the time taken to run a benchmark is utterly dominated by simulator time, I'm not too worried about that.
pozorvlak: (Default)
Friday, January 14th, 2011 03:01 pm
[This is a cleaned-up version of the notes from my Glasgow.pm tech talk last night. The central example is lifted wholesale from apenwarr's excellent README for redo; any errors are of course my own.]
Read more... )
pozorvlak: (pozorvlak)
Wednesday, January 12th, 2011 08:28 pm
I think that djb redo will turn out to be the Git of build systems.

Read more... )