Debugging reproducible rust builds
This is a bit long as it’s a summary of my day and a half debugging adventure. I wanted to throw as much info out there to communicate that debugging isn’t often a 5 minute find it and fix it process.
The background
We provide clients with signed versions of our rust binaries. The builds must be reproducible so that anyone can verify the binary contents are from the specified version of the source. A reproducible binary is one that results in the exact same bytes. This requires using a consistent build environment and tools.
The Failure
I was making some optimizations to our build and I wanted to make sure that my changes didn’t affect the resultant binaries. When testing my optimizations I could not reproduce the same binary.
The steps were more or less:
> cargo clean
...
> cargo build --release
...
> md5sum target/release/my_binary
<SOME_MD5_VALUE>
> cargo clean
...
> cargo build --release
...
> md5sum target/release/my_binary
<A_DIFFERENT_MD5_VALUE>
Note: I used
md5sum
as it was quick and at hand. We do not use MD5 in our binary signing process.
I decided to jump back to the tip of our mainline to make sure I was doing the build process correctly. To my surprise our mainline could not provide a reproducible binary.
I wanted to see if I could tell what was different between the binaries. I ran across elf_diff. As the name implies it allows one to diff elf binaries. Using it I could see that some serde implementations appeared to be different. For many of the serde implementations the only change was around function address values. I assumed this was due to surrounding functions increasing or decreasing in size changing the function addresses. A couple of functions I found had some different assembly instructions, but I’m not that fluent in assembly to quickly understand what it might mean. I decided to shelve that for now.
I thought for a bit, and was pretty sure our latest release had reproducible binaries. Running the above steps on the latest released version did indeed provide a reproducible binary.
So somewhere between our latest release and our current mainline we lost reproducible rust builds.
I recalled we had bumped rust versions recently in the mainline, so I decided to look there first. I tried to revert the rust version bump at the tip of mainline. I spent probably 30 minutes to an hour trying to revert the rust version. Then I realized I should just check out the commit prior to the rust version bump and test build reproducibility there.
The commit at the old rust version also failed to provide reproducible builds. I tried to think of what other changes could be the likely culprit. As I was thinking this over a thought finally came into my head.
I should use git bisect!
git bisect does a binary search through your commit history to find
the point when the breaking change happened. For this to be effective the commit
history has to be one that most, if not all, commits are building commits. I’ve
seen some code bases where people push non working changes into the mainline and
every week or month someone comes along and gets it building again. These types
of code bases probably aren’t a good candidate for git bisect
.
It took eight iterations of git bisect
to locate the commit that first broke
reproducible builds. This commit was pulling in functionality from a new crate
we had developed elsewhere. This new crate provided a rust wrapper around a
third party C library. Fortunately this commit was only integrating a small
piece of the new crate. Subsequent commits hooked up the rest of this new crate.
Working on the broken commit, I tried to remove small pieces of the integration
and rebuild looking for reproducibility. For instance pulling in the new crate
resulted in cargo updating serde, and since I saw those implementations had
changed in elf_diff
I tried to pin the serde version back to see what the
outcome might be. Much like my attempt to revert the rust versions, this wasn’t
a good approach. I realized it would be better to start from the commit prior,
which worked, and then slowly add small pieces of the change until I lose
reproducible builds. The reason to start from a working commit is that it’s
often easier to identify the one thing that will break the build. If one tries
to work from a broken commit and get it to the working state, there may be
multiple things that need to be tweaked and it’s easy to miss this combination.
As mentioned this new crate was a wrapper around a C library. We followed the
sys-packages recommendation. Which means we actually have two crates; a
foo
crate which provides the idiomatic rust interface and a foo-sys
crate
which does some very basic mapping of C to rust. In the foo-sys
crate we use
bindgen to generate the rust interface from the C headers. We also
derived Serialize
and Deserialize
from serde on some of the foo-sys
types.
Again latching on to the changed serde
implementations in the elf_diff
output, I decided to use cargo-expand on foo-sys
.
cargo-expand
will take rust files and expand all of the macros in those files.
This is similar to looking at C’s pre-processed output. I was hoping to see a
giant on/off switch in the wrong position in the output, but nothing stood out
to me. I ran it twice and compared the output, it was the same each time.
I realized that the foo
crate is really the one used by the clients and it
hides the foo-sys
crate from them. So I also ran cargo-expand
on it. Again
nothing grabbed my attention. Running it twice however resulted in different
output. Some of the derived functions moved around. I don’t know how well
cargo-expand
integrates with cargo and rustc so figured it might be expanding
in parallel and piecing together itself.
At this point I decided to use the last known good commit and start manually
pulling over the pieces of foo
into the problem binary. The idea was to leave
all of foo
out as a dependency copying over small pieces of code directly into
the problem crate. In doing this, I was skipping bindgen
and manually
implementing the C interfaces. Working this way for a bit, I was unable to get
the non reproducible build to occur.
After a bit of this, my battery decided it had enough, and I decided it was time for a walk.
The Cause
As often happens, I only had to start getting ready for a walk when the pieces started to come together in my head. Sort of like the ending of “The Usual Suspects”.
As I had mentioned in foo-sys
we derived Serialize
and Deserialize
on some
of the C types. To do this we used bindgen
’s add_derives()
method. The implementation looked something like.
fn add_derives(&self, info: &DeriveInfo<'_>) -> Vec<String> {
let mut derives = HashSet::new();
if self.some_case(info) {
derives.insert("Clone");
}
if self.other_case(info) {
derives.insert("Serialize");
}
...
derives.into_iter().map(String::from).collect::<Vec<_>>()
}
We used a set to make it easier to have two cases that might want Clone
.
If you tried to use a vector directly and you passed two Clone
s then the
compilation would bark at the bindgen
produced:
#[derive(Clone, Clone)]
pub MyStruct {
a: u32,
b: float
}
I had mentioned how elf_diff
had shown diffs on serde
implementations and
that we happened to derive Serialize
and Deserialize
on some types. I also
mentioned how the derives were reordered in one of the runs of cargo-expand
.
Unfortunately one has to look at the docs for HashMap
to
find the clue, but rust’s HashSet
uses the same default hasher.
By default, HashMap uses a hashing algorithm selected to provide resistance against HashDoS attacks. The algorithm is randomly seeded, and a reasonable best-effort is made to generate this seed from a high quality, secure source of randomness provided by the host without blocking the program.
This means that the hashing algorithm is randomly seeded on each run and thus
the hash values are not consistent between runs. The result was that the order
of the derives on the generated output of bindgen
would vary between builds.
To see this inconsistent hashing behavior in action one can run the following code and half the time the assert will cause a panic. You may need to run this 4 or more times to see a change in output.
To fix this inconsistent order one can:
sort
the vector before returning it.- use
BTreeSet
which will provide a consistent order across runs. - use a stable hasher instead of the default one.
Summary
The use of HashSet
, with the default hasher, during the code generation
resulted in the order of the derives changing between builds. The compiler
most likely expands the derive macros in the order they are written. This
results in different generated code based on the order of the derive macros.
This first snippet has Debug
before Clone
.
#[derive(Debug, Clone)]
pub MyStruct {
a: u32,
b: float
}
This next snippet has Clone
before Debug
, it will result in a different
binary than the previous snippet.
#[derive(Clone, Debug)]
pub MyStruct {
a: u32,
b: float
}
To ensure reproducible builds avoid direct use of the default hasher in generated code.