During repro build discussions the other day, I think I got a bit better
understanding of "build environment". Feel free to share these notes in
any other forum where it's relevant.
My high-level context is considerations for installing a "transparent
binary". Then I or my install agent will rely on source claims and build
claims:
SC: A source claim says "A source package with name, version, hash
(maybe other metadata) is an officially released package". The claim
is made by a source authority, e.g., debian release team deciding
what goes into a release, or any software maintainer deciding what
are the official releases of their package. (Note there's no claim
about package quality, only intention of "officially" releasing it).
BC: A build claim says "Building this source package (identified by hash)
produced these binaries (identified by hash).".
Both types of claims need to be transparently logged, to get a consensus
on what's being released, and what corresponding binaries are.
By getting a source claim and several build claims from independent
builders, I can get some confidence that a binary corresponds to the
official source package, and decide to install it.
Now, I'd like a build claim to be falsifiable: Given a build claim, I
should be able to get the corresponding source package, build it, and
see if I get the same binaries as output. A source package typically
includes information on build dependencies, but not sufficient to pin
everything the build depends on. I see some different kinds of
dependencies:
1. Traditional build deps, like, "needs a c11-capable compiler, and the
Foo library of at least version 1.7." These specify what's necessery
to get a successful build.
2. Build environment, by which I mean details where variations are
expected to produce different but acceptable build outputs. E.g.,
version of the compiler and other code-generation tools used, and
precise version of the libraries linked to. It's not clear to me
which of these things should be identified by a cryptographic hash,
and which could use some other kind of ids.
3. Auxilliary environment (for lack of better term): Circumstances that
are not expected to affect the resulting build outputs. E.g., kernel
version should not matter, even though it's *possible* that a buggy
or malicious kernel could make the build outputs different.
I'm also thinking that the build environment specifies things that the
builder will configure separately for each of its builds, while the
auxillary environment is fixed, only modified at general
hardware/software updates of the builder.
My thinking is that (1) should be part of the source package (and
indirectly, in the SC), (2) should be part of the build claim (BC). It
makes sense that some information of type (3) for each build is
recorded, for trouble shooting when builders get different results for
the same source package and build environment, but I don't think about
it as part of the build claim.
Responsibility is different for the build environment and the auxilliary
environment: The party operating a builder can take responsibility for
the auxilliary environment: E.g., depending on the BC:s signed by a
particular builder implies some level of trust that the builder doesn't
run a malicious kernel, and I think that's reasonable.
Question then: Who's responsible for the build environment? Since we
want builds to be reproducible, the build environment is *not* under the
control of the party operating the builder: For the builder to be
useful, it has to use the same build environment (for each particular
source package) as has somehow been decided by others. Therefore, the
builder is not in a position to claim that the build environment is
appropriate.
A build environment specifying a malicious version of a compiler or
library is obviously a threat to the user about to install the resulting
binaries. So we need someone to take responsibility for that, in a
transparent manner. Which brings us to build environment claim:
BEC: A build environment claim says "The build environment BE is
appropriate for building source package X". (One could maybe
consider more general claims, "It's appropriate to to base the
build environment, for a large set of source packages, on latest
versions of debian stable as of time T", but that will be somewhat
difficult to verify by the user, so let's stick to a per source
package build environment claim for now).
In case we have a structure with a primary builder and rebuilders, e.g.,
primary builder being a debian buildd machine, it seems easiest to make
the primary builder the build environment authority. It needs to
communicate its choice of build environment to the rebuilders (to enable
reproducibility). It also needs to communicate a signed and
transparently logged build environment claim to the end user.
It may be possible to treat the BC from a primary builder specially: If
a build claim is signed not by any builder but by the primary builder,
then it implicitly implies the claim that the build enviroment is
appropriate. But since no other builder can make that claim, I think
it's clearer to keep it out of the BC.
A final note on build environment vs auxilliary environment: Whenever
possible, it's desirable move information from the build environment
(and hence, from the BC) to the auxilliary environment. This needs a
case-by-case analysis. E.g., when building gcc, the build dependencies
must say that it needs a C++ compiler. But which one shouldn't matter,
since it affects only the stage1 binaries, while the final stage3
binaries are independent of which compiler was used during stage1. So
the initial compiler for stage1 belongs with the auxilliary environment,
and it improves confidence in the resulting gcc binaries if the builders
use different compilers during stage1 of the build.
Regards,
/Niels