Debugging reproducible rust builds

This is a bit long as it’s a summary of my day and a half debugging adventure. I wanted to throw as much info out there to communicate that debugging isn’t often a 5 minute find it and fix it process.

The background

We provide clients with signed versions of our rust binaries. The builds must be reproducible so that anyone can verify the binary contents are from the specified version of the source. A reproducible binary is one that results in the exact same bytes. This requires using a consistent build environment and tools.

The Failure

I was making some optimizations to our build and I wanted to make sure that my changes didn’t affect the resultant binaries. When testing my optimizations I could not reproduce the same binary.

The steps were more or less:

> cargo clean
...
> cargo build --release
...
> md5sum target/release/my_binary
<SOME_MD5_VALUE>
> cargo clean
...
> cargo build --release
...
> md5sum target/release/my_binary
<A_DIFFERENT_MD5_VALUE>

Note: I used md5sum as it was quick and at hand. We do not use MD5 in our binary signing process.

I decided to jump back to the tip of our mainline to make sure I was doing the build process correctly. To my surprise our mainline could not provide a reproducible binary.

I wanted to see if I could tell what was different between the binaries. I ran across elf_diff. As the name implies it allows one to diff elf binaries. Using it I could see that some serde implementations appeared to be different. For many of the serde implementations the only change was around function address values. I assumed this was due to surrounding functions increasing or decreasing in size changing the function addresses. A couple of functions I found had some different assembly instructions, but I’m not that fluent in assembly to quickly understand what it might mean. I decided to shelve that for now.

I thought for a bit, and was pretty sure our latest release had reproducible binaries. Running the above steps on the latest released version did indeed provide a reproducible binary.

So somewhere between our latest release and our current mainline we lost reproducible rust builds.

I recalled we had bumped rust versions recently in the mainline, so I decided to look there first. I tried to revert the rust version bump at the tip of mainline. I spent probably 30 minutes to an hour trying to revert the rust version. Then I realized I should just check out the commit prior to the rust version bump and test build reproducibility there.

The commit at the old rust version also failed to provide reproducible builds. I tried to think of what other changes could be the likely culprit. As I was thinking this over a thought finally came into my head.

I should use git bisect!

git bisect does a binary search through your commit history to find the point when the breaking change happened. For this to be effective the commit history has to be one that most, if not all, commits are building commits. I’ve seen some code bases where people push non working changes into the mainline and every week or month someone comes along and gets it building again. These types of code bases probably aren’t a good candidate for git bisect.

It took eight iterations of git bisect to locate the commit that first broke reproducible builds. This commit was pulling in functionality from a new crate we had developed elsewhere. This new crate provided a rust wrapper around a third party C library. Fortunately this commit was only integrating a small piece of the new crate. Subsequent commits hooked up the rest of this new crate.

Working on the broken commit, I tried to remove small pieces of the integration and rebuild looking for reproducibility. For instance pulling in the new crate resulted in cargo updating serde, and since I saw those implementations had changed in elf_diff I tried to pin the serde version back to see what the outcome might be. Much like my attempt to revert the rust versions, this wasn’t a good approach. I realized it would be better to start from the commit prior, which worked, and then slowly add small pieces of the change until I lose reproducible builds. The reason to start from a working commit is that it’s often easier to identify the one thing that will break the build. If one tries to work from a broken commit and get it to the working state, there may be multiple things that need to be tweaked and it’s easy to miss this combination.

As mentioned this new crate was a wrapper around a C library. We followed the sys-packages recommendation. Which means we actually have two crates; a foo crate which provides the idiomatic rust interface and a foo-sys crate which does some very basic mapping of C to rust. In the foo-sys crate we use bindgen to generate the rust interface from the C headers. We also derived Serialize and Deserialize from serde on some of the foo-sys types.

Again latching on to the changed serde implementations in the elf_diff output, I decided to use cargo-expand on foo-sys. cargo-expand will take rust files and expand all of the macros in those files. This is similar to looking at C’s pre-processed output. I was hoping to see a giant on/off switch in the wrong position in the output, but nothing stood out to me. I ran it twice and compared the output, it was the same each time.

I realized that the foo crate is really the one used by the clients and it hides the foo-sys crate from them. So I also ran cargo-expand on it. Again nothing grabbed my attention. Running it twice however resulted in different output. Some of the derived functions moved around. I don’t know how well cargo-expand integrates with cargo and rustc so figured it might be expanding in parallel and piecing together itself.

At this point I decided to use the last known good commit and start manually pulling over the pieces of foo into the problem binary. The idea was to leave all of foo out as a dependency copying over small pieces of code directly into the problem crate. In doing this, I was skipping bindgen and manually implementing the C interfaces. Working this way for a bit, I was unable to get the non reproducible build to occur.

After a bit of this, my battery decided it had enough, and I decided it was time for a walk.

The Cause

As often happens, I only had to start getting ready for a walk when the pieces started to come together in my head. Sort of like the ending of “The Usual Suspects”.

As I had mentioned in foo-sys we derived Serialize and Deserialize on some of the C types. To do this we used bindgen’s add_derives() method. The implementation looked something like.

    fn add_derives(&self, info: &DeriveInfo<'_>) -> Vec<String> {
        let mut derives = HashSet::new();
        if self.some_case(info) {
            derives.insert("Clone");
        }
        if self.other_case(info) {
            derives.insert("Serialize");
        }
        ...
        derives.into_iter().map(String::from).collect::<Vec<_>>()
    }

We used a set to make it easier to have two cases that might want Clone. If you tried to use a vector directly and you passed two Clones then the compilation would bark at the bindgen produced:

#[derive(Clone, Clone)]
pub MyStruct {
    a: u32,
    b: float
}

I had mentioned how elf_diff had shown diffs on serde implementations and that we happened to derive Serialize and Deserialize on some types. I also mentioned how the derives were reordered in one of the runs of cargo-expand.

Unfortunately one has to look at the docs for HashMap to find the clue, but rust’s HashSet uses the same default hasher.

By default, HashMap uses a hashing algorithm selected to provide resistance against HashDoS attacks. The algorithm is randomly seeded, and a reasonable best-effort is made to generate this seed from a high quality, secure source of randomness provided by the host without blocking the program.

This means that the hashing algorithm is randomly seeded on each run and thus the hash values are not consistent between runs. The result was that the order of the derives on the generated output of bindgen would vary between builds.

To see this inconsistent hashing behavior in action one can run the following code and half the time the assert will cause a panic. You may need to run this 4 or more times to see a change in output.

To fix this inconsistent order one can:

Summary

The use of HashSet, with the default hasher, during the code generation resulted in the order of the derives changing between builds. The compiler most likely expands the derive macros in the order they are written. This results in different generated code based on the order of the derive macros.

This first snippet has Debug before Clone.

#[derive(Debug, Clone)]
pub MyStruct {
    a: u32,
    b: float
}

This next snippet has Clone before Debug, it will result in a different binary than the previous snippet.

#[derive(Clone, Debug)]
pub MyStruct {
    a: u32,
    b: float
}

To ensure reproducible builds avoid direct use of the default hasher in generated code.