All your nightlies are belong to Taskcluster
In January I announced that we
had just migrated Linux nightly builds to Taskcluster.
We completed a huge milestone in July: starting in Firefox 56, we've been
doing all our nightly Firefox builds in Taskcluster.
This includes all Windows, macOS, Linux, and Android builds. You can
see all the builds and repacks on Treeherder.
In August, after 56 merged to Beta, we've also been doing our Firefox
Beta builds using Taskcluster. We're on track to be shipping Firefox 56, built from Taskcluster to release users at the end of September.
Windows and macOS each had their own challenges to get them ready to
build and ship to our nightly users.
We've had Windows builds running in Taskcluster for quite a while now.
The biggest missing piece stopping us from shipping these builds was
Windows builds end up being a bit complicated to sign.
First, each compiled .exe and .dll binary needs to be signed.
Signing binaries in windows changes their contents, and so we need to
regenerate some files that depend on the exact contents of binaries.
Next, we need to create packages in various formats: a "setup.exe" for
installing Firefox, and also MAR files for updates.
Each of these package formats in turn need to be signed.
In buildbot, this process was monolithic. All of the binary
generation and signing happened as part of the same build process. The
same process would also publish symbols to the symbol server and
publish updates to Balrog The downside of this monolithic process
is that it adds additional dependencies to the build, which is already
a really long process. If something goes wrong with signing, or
publishing updates, you don't want to have to restart a 2 hour build!
As part of our migration to Taskcluster, we decided that builds should
minimize their external dependencies. This means that the build task
produces only unsigned binaries, and it is the responsibility of
downstream tasks to sign them. We also wanted discrete tasks for
symbol and update submission.
One wrinkle in this approach is that the logic that defines how to
create a setup.exe package or a MAR file lives in tree. We didn't
want to run that code in the same context as the code that generates
Our solution to this was to create a sequence of build ->
signing -> repackage -> signing tasks. The signing tasks run in a
restricted environment while the build and repackage tasks have access to
the build system in order to produce the required artifacts. Using the
chain of trust, we can demonstrate that the artifacts weren't
tampered with between intermediate tasks.
Finally, we need to consider l10n repacks. We ship Firefox in over 90
locales. The repacking process downloads the en-US build and replaces
the English strings with localized strings. Each of these repacks
needs to be based on the signed en-US build. Each will also generate
its own setup.exe and complete MAR for updates.
macOS performance (and why your build directory matters)
Like Windows, we've had macOS builds running on Taskcluster for a long
time. Also like Windows, we had to solve signing for macOS.
However, the biggest blocker for the macOS build migration, was a
performance bug. Builds
produced on Taskcluster showed some serious performance regressions as
compared to the builds produced on buildbot.
Many very smart people looked at this bug since it was first
discovered in February. They compared library versions being used.
They compared compiler versions and compiler flags. They even
inspected the generated assembly code from both systems.
Mike Shal stumbled across the first clue to what was going on in
if he stripped the Taskcluster binaries, then the performance problems
disappeared! At this point we decided that we could go ahead and ship
these builds to nightly users, knowing that the performance regression
would disappear on beta and release.
Later on, Mike realized that it's not the presence or absence of
symbols in the binary that cause the performance hit, it's what
directory the builds are done in. On buildbot we build under
/builds/..., and on Taskcluster we build under /home/...
Read the bug for more gory details. This is definitely one of the
strangest bugs I've seen.
We learned quite a bit in the process of migrating Windows and macOS
nightly builds to Taskcluster.
First, we gained a huge amount of experience with the in-tree scheduling system.
There's a bit of a learning curve to climb, but it's an
extremely powerful and flexible system. Many kudos to Dustin for his work creating the foundation of
this system here. His blog post, "What's So Special About "In-Tree"?",
is a great explanation of why having this code as part of Firefox's
repository is so important.
One of the killer features of having all the scheduling logic live
in-tree is that you can do quite a bit of work locally, without
requiring any build infrastructure. This is extremely useful when
working on the complex build / signing / repackage sequence of tasks
described above. You can make your changes, generate a new task graph,
and inspect the results.
Once you're happy with your local changes, you can push them to try to
validate your local testing, get your patch reviewed, and then finally
landed in gecko. Your scheduling changes will take effect as soon as
they land into the repo. This made it possible for us to do a lot of
testing on another project branch, and then merge the code to central
once we were ready.
We're on track to ship builds produced in Taskcluster as part of the
56.0 release scheduled for late September. After that the only Firefox
builds being produced by buildbot will be for ESR52.
Meanwhile, we've started tackling the remaining parts of release
automation. We prioritized getting nightly and CI builds migrated to
Taskcluster, however, there are still parts of the release process
still implemented in Buildbot.
We're aiming to have release automation completely migrated
off of buildbot by the end of the year. We've already seen many
benefits from migrating CI to Taskcluster, and migrating the release
process will realize many of those same benefits.
Thank you for reading this far!
Members from the Release Engineering, Release Operations, Taskcluster,
Build, and Product Integrity teams all were involved in finishing up
this migration. Thanks to everyone involved (there are a lot of you!)
to getting us across the finish line here.
In particular, if you come across one of these fine individuals at the
office, or maybe on IRC, I'm sure they would appreciate a quick