During repro build discussions the other day, I think I got a bit better understanding of "build environment". Feel free to share these notes in any other forum where it's relevant.
My high-level context is considerations for installing a "transparent binary". Then I or my install agent will rely on source claims and build claims:
SC: A source claim says "A source package with name, version, hash (maybe other metadata) is an officially released package". The claim is made by a source authority, e.g., debian release team deciding what goes into a release, or any software maintainer deciding what are the official releases of their package. (Note there's no claim about package quality, only intention of "officially" releasing it).
BC: A build claim says "Building this source package (identified by hash) produced these binaries (identified by hash).".
Both types of claims need to be transparently logged, to get a consensus on what's being released, and what corresponding binaries are.
By getting a source claim and several build claims from independent builders, I can get some confidence that a binary corresponds to the official source package, and decide to install it.
Now, I'd like a build claim to be falsifiable: Given a build claim, I should be able to get the corresponding source package, build it, and see if I get the same binaries as output. A source package typically includes information on build dependencies, but not sufficient to pin everything the build depends on. I see some different kinds of dependencies:
1. Traditional build deps, like, "needs a c11-capable compiler, and the Foo library of at least version 1.7." These specify what's necessery to get a successful build.
2. Build environment, by which I mean details where variations are expected to produce different but acceptable build outputs. E.g., version of the compiler and other code-generation tools used, and precise version of the libraries linked to. It's not clear to me which of these things should be identified by a cryptographic hash, and which could use some other kind of ids.
3. Auxilliary environment (for lack of better term): Circumstances that are not expected to affect the resulting build outputs. E.g., kernel version should not matter, even though it's *possible* that a buggy or malicious kernel could make the build outputs different.
I'm also thinking that the build environment specifies things that the builder will configure separately for each of its builds, while the auxillary environment is fixed, only modified at general hardware/software updates of the builder.
My thinking is that (1) should be part of the source package (and indirectly, in the SC), (2) should be part of the build claim (BC). It makes sense that some information of type (3) for each build is recorded, for trouble shooting when builders get different results for the same source package and build environment, but I don't think about it as part of the build claim.
Responsibility is different for the build environment and the auxilliary environment: The party operating a builder can take responsibility for the auxilliary environment: E.g., depending on the BC:s signed by a particular builder implies some level of trust that the builder doesn't run a malicious kernel, and I think that's reasonable.
Question then: Who's responsible for the build environment? Since we want builds to be reproducible, the build environment is *not* under the control of the party operating the builder: For the builder to be useful, it has to use the same build environment (for each particular source package) as has somehow been decided by others. Therefore, the builder is not in a position to claim that the build environment is appropriate.
A build environment specifying a malicious version of a compiler or library is obviously a threat to the user about to install the resulting binaries. So we need someone to take responsibility for that, in a transparent manner. Which brings us to build environment claim:
BEC: A build environment claim says "The build environment BE is appropriate for building source package X". (One could maybe consider more general claims, "It's appropriate to to base the build environment, for a large set of source packages, on latest versions of debian stable as of time T", but that will be somewhat difficult to verify by the user, so let's stick to a per source package build environment claim for now).
In case we have a structure with a primary builder and rebuilders, e.g., primary builder being a debian buildd machine, it seems easiest to make the primary builder the build environment authority. It needs to communicate its choice of build environment to the rebuilders (to enable reproducibility). It also needs to communicate a signed and transparently logged build environment claim to the end user.
It may be possible to treat the BC from a primary builder specially: If a build claim is signed not by any builder but by the primary builder, then it implicitly implies the claim that the build enviroment is appropriate. But since no other builder can make that claim, I think it's clearer to keep it out of the BC.
A final note on build environment vs auxilliary environment: Whenever possible, it's desirable move information from the build environment (and hence, from the BC) to the auxilliary environment. This needs a case-by-case analysis. E.g., when building gcc, the build dependencies must say that it needs a C++ compiler. But which one shouldn't matter, since it affects only the stage1 binaries, while the final stage3 binaries are independent of which compiler was used during stage1. So the initial compiler for stage1 belongs with the auxilliary environment, and it improves confidence in the resulting gcc binaries if the builders use different compilers during stage1 of the build.
Regards, /Niels
st-discuss@lists.system-transparency.org