A subtle issue when using Nix and Cabal for CI
Philip and I recently migrated the CI
infrastructure for Futhark to GitHub
Actions. We have many different CI jobs, in particular we compile
Futhark in many different ways (both stack
, cabal-install
, and
Nix
) and on various operating systems.
It has mostly been a good experience, but one of our jobs was being
flaky. The job in question was responsible for compiling Futhark in
order to run the unit tests (with cabal test
), as well as run
hlint on the source code.
To make it easy to use the same environment in CI and locally, we
decided to use Nix to obtain the various
necessary tools, such as GHC itself,
cabal-install,
and hlint
. Specifically, we used a
shell.nix
file, and then commands such as nix-shell --pure --run 'cabal test'
to run the tests. Yet sometimes cabal test
would fail with a
mysterious error:
/home/runner/.cabal/store/ghc-8.10.1/happy-1.19.12-77f44f4e1b397ecd8847f7694a29d33efa016984155f1eb70f21d8ed5fbf3069/bin/happy:
createProcess: runInteractiveProcess: exec: does not exist (No such file or directory)
happy
is a parser generator that
cabal
will automatically build if necessary, like all other build
tools. It took me a while to realise that the error message is
misleading: the happy
binary does exist (or else cabal
would
rebuild it), but it is dynamically linked against libraries that fail
to be found (incidentally, who decided it was a good idea for the
dynamic linker to report the error like that? Grumble.)
So how does this happen? Well, to cut down on build times, our CI
uses caching. Specifically, it caches the ~/.cabal/packages
and
~/.cabal/store
directories. This means that the happy
binary we
use is likely one that was built during a previous CI run. However,
because we run cabal
under Nix, that happy
binary will be
dynamically linked against libraries in the Nix store (located under
/nix
), which we do not cache, because they are normally fetched or
rebuilt by Nix on demand. The cached happy
binary therefore has a
dependency on very specific libraries in the Nix store, which cabal
doesn’t know about! When the Nix store eventually changes, which it
does semi-frequently, the path to the C library embedded in the
happy
binary will no longer be valid, and happy
will fail to run.
How did we fix this? Not elegantly, I’m afraid: we moved cabal test
to another job, specifically one built without Nix, instead using
normal Debian packages. I’d be quite curious to see if anyone has a
nice solution to this problem.
Update: Philip came up with a better
solution
that pins the Nixpkgs revision, and ties the cabal
cache to the
specific revision.