<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>chris' random ramblings (Posts about firefox)</title><link>https://atlee.ca/</link><description></description><atom:link href="https://atlee.ca/categories/firefox.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Sat, 22 Feb 2025 20:04:32 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Firefox release speed wins</title><link>https://atlee.ca/posts/faster-releases/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;Sylvestre
&lt;a href="https://hacks.mozilla.org/2018/03/shipping-a-security-update-of-firefox-in-less-than-a-day/"&gt;wrote&lt;/a&gt;
about how we were able to ship new releases for Nightly, Beta, Release and ESR versions of Firefox for Desktop and Android in less than a day in response to the
&lt;a href="https://en.wikipedia.org/wiki/Pwn2Own"&gt;pwn2own&lt;/a&gt; contest.&lt;/p&gt;
&lt;p&gt;People commented on how much faster the Beta and Release releases were
compared to the ESR release, so I wanted to dive into the releases on the
different branches to understand if this really was the case, and if so,
why?&lt;/p&gt;
&lt;h2 id="chemspill-timings"&gt;Chemspill timings&lt;/h2&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;                    | Firefox ESR 52.7.2 | Firefox 59.0.1  | Firefox 60.0b4
 ------------------ | ------------------ | --------------- | --------------
 Fix landed in HG   | 23:33:06           | 23:31:28        | 23:29:54
 en-US builds ready | 03:19:03 +3h45m    | 01:16:41 +1h45m | 01:16:47 +1h46m
 Updates ready      | 08:43:03 +5h42m    | 04:21:17 +3h04m | 04:41:02 +3h25m
 Total              | 9h09m              | 4h49m           | 5h11m
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(All times UTC from 2018-03-15 -&amp;gt; 2018-03-16)&lt;/p&gt;
&lt;!---
## Firefox ESR 52.7.2
[hg](https://hg.mozilla.org/releases/mozilla-esr52/rev/494e5d5278ba6f5fdda9a2bb9ac7ca772653ee4a)
[treeherder](https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr52&amp;revision=494e5d5278ba6f5fdda9a2bb9ac7ca772653ee4a)
[taskgroup](https://tools.taskcluster.net/groups/JF41dvVxTy2q4RjgpBlU4A)

    Fix landed in HG            2018-03-15 23:33:06Z
    en-US builds ready          2018-03-16 03:19:03Z  +3h45m
    Updates ready for testing   2018-03-16 08:43:03Z  +5h24m
    Total time                                         9h09m

## Firefox 59.0.1
[hg](https://hg.mozilla.org/releases/mozilla-release/rev/3db9e3d52b17563efca181ccbb50deb8660c59ae)
[treeherder](https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&amp;revision=3db9e3d52b17563efca181ccbb50deb8660c59ae)
[taskgroup](https://tools.taskcluster.net/push-inspector/#/eQGHNp4jT2yM_G_uP7A3og)

    Fix landed in HG            2018-03-15 23:31:28Z
    en-US builds ready          2018-03-16 01:16:41Z  +1h45m
    Updates ready for testing   2018-03-16 04:21:17Z  +3h04m
    Total time                                         4h49m

## Firefox Beta 60.0b4
[hg](https://hg.mozilla.org/releases/mozilla-beta/rev/1dfbedb54c39abae38da9329f4a79571fee74661)
[treeherder](https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&amp;revision=1dfbedb54c39abae38da9329f4a79571fee74661)
[taskgroup](https://tools.taskcluster.net/groups/MIUq4oRVRXGMfyduHB3W3Q)

    Fix landed in HG            2018-03-15 23:29:54Z
    en-US builds ready          2018-03-16 01:16:47Z  +1h46m
    Updates ready for testing   2018-03-16 04:41:02Z  +3h25m
    Total time                                         5h11m
--&gt;

&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;iframe src="https://giphy.com/embed/3oriNYQX2lC6dfW2Ji" width="480" height="270" frameborder="0" class="giphy-embed" allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;&lt;a href="https://giphy.com/gifs/foxhomeent-xmen-quicksilver-3oriNYQX2lC6dfW2Ji"&gt;via
GIPHY&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that Firefox 59 and 60.0b4 were significantly faster to run than ESR
52 was! What's behind this speedup?&lt;/p&gt;
&lt;p&gt;Release Engineering have been busy migrating release automation from
&lt;a href="https://atlee.ca/posts/migration-status-3/"&gt;buildbot to taskcluster&lt;/a&gt; . Much of ESR52 still runs on buildbot, while
Firefox 59 is mostly done in Taskcluster, and Firefox 60 is entirely done
in Taskcluster.&lt;/p&gt;
&lt;p&gt;In ESR52 the initial builds are still done in buildbot, which has been
missing out on many performance gains from the build system and AWS side.
Update testing is done via buildbot on slower mac minis or windows hardware.&lt;/p&gt;
&lt;p&gt;The Firefox 59 release had much faster builds, and update verification is
done in Taskcluster on fast linux machines instead of the old mac minis or
windows hardware.&lt;/p&gt;
&lt;p&gt;The Firefox 60.0b4 release also had much faster builds, and ended up
running in about the same time as Firefox 59. It turns out that we hit
several intermittent infrastructure failures in 60.0b4 that caused this
release to be slower than it could have been. Also, because we had
multiple releases running simultaneously, we did see some resource
contention for tasks like signing.&lt;/p&gt;
&lt;p&gt;For comparison, here's what 60.0b11 looks like:&lt;/p&gt;
&lt;!---
## Firefox Beta 60.0b11
[hg](https://hg.mozilla.org/releases/mozilla-beta/rev/5b9ee6a707a068cd1ee79ea2fff7ff4147c694eb)
[treeherder](https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&amp;revision=5b9ee6a707a068cd1ee79ea2fff7ff4147c694eb)
[taskgroup](https://tools.taskcluster.net/groups/XP_KLzLiSSq8ApFDzZrtFA)

    Fix landed in HG            2018-04-09 18:45:45Z
    en-US builds ready          2018-04-09 20:41:53Z  +1h56m
    Updates ready for testing   2018-04-09 22:19:30Z  +1h37m
    Total time                                         3h33m
--&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;                    | Firefox 60.0b11
 ------------------ | --------------- 
 Fix landed in HG   | 18:45:45
 en-US builds ready | 20:41:53 +1h56m
 Updates ready      | 22:19:30 +1h37m
 Total              | 3h33m
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Wow, down to 3.5 hours!&lt;/p&gt;
&lt;p&gt;In addition to the faster builds and faster update tests, we're seeing a
lot of wins from increased parallelization that we can do now using
taskcluster's much more flexible scheduling engine.  There's still more we
can do to speed up certain types of tasks, fix up intermittent failures,
and increase parallelization. I'm curious just how fast this pipeline can
be :)&lt;/p&gt;</description><category>firefox</category><category>mozilla</category><category>releng</category><guid>https://atlee.ca/posts/faster-releases/</guid><pubDate>Wed, 25 Apr 2018 17:20:00 GMT</pubDate></item><item><title>Taskcluster migration update: we're finished!</title><link>https://atlee.ca/posts/migration-status-3/</link><dc:creator>chris</dc:creator><description>&lt;h2 id="were-done"&gt;We're done!&lt;/h2&gt;
&lt;p style="text-align:center"&gt;
&lt;img src="https://media.giphy.com/media/26tPo1I4XyWzIBjFe/giphy.gif"&gt;
&lt;/p&gt;

&lt;p&gt;Over the past few weeks we've hit a few major milestones in our project to
migrate all of Firefox's CI and release automation to
&lt;a href="https://docs.taskcluster.net/"&gt;taskcluster&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Firefox 60 and higher are now &lt;strong&gt;100% on taskcluster!&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="tests"&gt;Tests&lt;/h3&gt;
&lt;p&gt;At the end of March, our Release Operations and Project Integrity teams &lt;a href="https://hg.mozilla.org/mozilla-central/rev/08c54405586b"&gt;finished
migrating&lt;/a&gt; Windows tests onto new hardware machines, all running
taskcluster. That work was later &lt;a href="https://hg.mozilla.org/releases/mozilla-beta/rev/cfe7adda153d"&gt;uplifted to
beta&lt;/a&gt; so
that CI automation on beta would also be completely done using taskcluster.&lt;/p&gt;
&lt;p&gt;This marked the last usage of buildbot for Firefox CI.&lt;/p&gt;
&lt;h3 id="periodic-updates-of-blocklist-and-pinning-data"&gt;Periodic updates of blocklist and pinning data&lt;/h3&gt;
&lt;p&gt;Last week we &lt;a href="https://hg.mozilla.org/mozilla-central/rev/6d0dcc642e1a"&gt;switched
off&lt;/a&gt; the buildbot versions of the periodic update
jobs. These jobs keep the in-tree versions of blocklist, HSTS and HPKP
lists up to date.&lt;/p&gt;
&lt;p&gt;These were the last buildbot jobs running on trunk branches.&lt;/p&gt;
&lt;h3 id="partner-repacks"&gt;Partner repacks&lt;/h3&gt;
&lt;p&gt;And to wrap things up, yesterday the &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1398803"&gt;final patches
landed&lt;/a&gt; to migrate
partner repacks to taskcluster. Firefox 60.0b14 was built yesterday and shipped
today 100% using taskcluster.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;massive&lt;/strong&gt; amount of work went into migrating partner repacks from
buildbot to taskcluster, and I'm really proud of the whole team for pulling
this off.&lt;/p&gt;
&lt;p&gt;So, starting today, Firefox 60 and higher will be completely off
taskcluster and not rely on buildbot.&lt;/p&gt;
&lt;p&gt;It feels really good to write that :)&lt;/p&gt;
&lt;p&gt;We've been working on migrating Firefox to taskcluster for over three
years! Code archaeology is hard, but I think the first Firefox jobs to start
running in Taskcluster were the Linux64 builds, done by Morgan in &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1155749"&gt;bug
1155749&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="into-the-glorious-future"&gt;Into the glorious future&lt;/h2&gt;
&lt;p&gt;It's great to have migrated everything off of buildbot and onto
taskcluster, and we have endless ideas for how to improve things now that
we're there. First we need to spend some time cleaning up after ourselves
and paying down some technical debt we've accumulated. It's a good time to
start ripping out buildbot code from the tree as well.&lt;/p&gt;
&lt;p&gt;We've got other plans to make release automation easier for other people to
work with, including doing staging releases on try(!!), making the nightly
release process more similar to the beta/release process, and for exposing
different parts of the release process to release management so that releng
doesn't have to be directly involved with the day-to-day release mechanics.&lt;/p&gt;</description><category>firefox</category><category>mozilla</category><category>releng</category><category>taskcluster</category><guid>https://atlee.ca/posts/migration-status-3/</guid><pubDate>Fri, 20 Apr 2018 16:50:59 GMT</pubDate></item><item><title>RelEng Retrospective - Q1 2015</title><link>https://atlee.ca/posts/releng-retrospective-q1-2015/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;&lt;a href="https://wiki.mozilla.org/ReleaseEngineering"&gt;RelEng&lt;/a&gt; had a great start to
2015. We hit some major milestones on projects like Balrog and were able to turn
off some old legacy systems, which is always an extremely satisfying thing to do!&lt;/p&gt;
&lt;p&gt;We also made some exciting new changes to the underlying infrastructure, got
some projects off the drawing board and into production, and drastically
reduced our test load!&lt;/p&gt;
&lt;h2 id="firefox-updates"&gt;Firefox updates&lt;/h2&gt;
&lt;h3 id="balrog"&gt;&lt;a href="https://wiki.mozilla.org/Balrog"&gt;Balrog&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://werewolfnightmare101.deviantart.com/art/Balrog-Drawing-254795366"&gt;&lt;img alt="balrog" src="https://atlee.ca/posts/releng-retrospective-q1-2015/balrog.png"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All Firefox update queries are now being served by Balrog!  Earlier this year,
we switched all Firefox update queries off of the old update server,
aus3.mozilla.org, to the new update server, codenamed
&lt;a href="https://wiki.mozilla.org/Balrog"&gt;Balrog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Already, Balrog has enabled us to be much more flexible in handling updates
than the previous system. As an example, in &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1150021"&gt;bug
1150021&lt;/a&gt;, the About
Firefox dialog was broken in the Beta version of Firefox 38 for users with RTL
locales. Once the problem was discovered, we were able to quickly disable
updates just for those users until a fix was ready. With the previous system it
would have taken many hours of specialized manual work to disable the updates
for just these locales, and to make sure they didn't get updates for subsequent
Betas.&lt;/p&gt;
&lt;p&gt;Once we were confident that Balrog was able to handle all previous traffic, we
&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1117962"&gt;shut down the old update server (aus3)&lt;/a&gt;.
aus3 was also one of the last systems relying on CVS (!! I know, rite?). It's a
great feeling to be one step closer to axing one more old system!&lt;/p&gt;
&lt;h3 id="funsize"&gt;&lt;a href="https://wiki.mozilla.org/ReleaseEngineering/Funsize"&gt;Funsize&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;When we started the quarter, we had an exciting new plan for generating partial
updates for Firefox in a scalable way.&lt;/p&gt;
&lt;p&gt;Then we threw out that plan and came up with an EVEN MOAR BETTER plan!&lt;/p&gt;
&lt;p&gt;The &lt;a href="http://rail.merail.ca/posts/taskcluster-first-impression.html"&gt;new architecture&lt;/a&gt;
for funsize relies on &lt;a href="https://pulse.mozilla.org/"&gt;Pulse&lt;/a&gt; for notifications
about new nightly builds
that need partial updates, and uses &lt;a href="http://docs.taskcluster.net/"&gt;TaskCluster&lt;/a&gt;
for doing the generation of the partials and publishing to Balrog.&lt;/p&gt;
&lt;p&gt;The current status of funsize is that we're using it to &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1118015"&gt;generate partial
updates for nightly builds&lt;/a&gt;,
but not published to the regular nightly update channel yet.&lt;/p&gt;
&lt;p&gt;There's lots more to say here...stay tuned!&lt;/p&gt;
&lt;h2 id="ftp-s3"&gt;FTP &amp;amp; S3&lt;/h2&gt;
&lt;p&gt;Brace yourselves... &lt;a href="http://ftp.mozilla.org/pub/mozilla.org/"&gt;ftp.mozilla.org&lt;/a&gt;
is going away...&lt;/p&gt;
&lt;p&gt;&lt;img alt="brace yourselves...ftp is going away" src="https://atlee.ca/posts/releng-retrospective-q1-2015/61319299.jpg"&gt;&lt;/p&gt;
&lt;p&gt;...in its current incarnation at least.&lt;/p&gt;
&lt;p&gt;Expect to hear MUCH more about this in the coming months.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; is that we're migrating as much of the Firefox build/test/release
automation to S3 as possible.&lt;/p&gt;
&lt;p&gt;The existing machinery behind ftp.mozilla.org will be going away near the end of Q3. We
have some ideas of how we're going to handle migrating existing content, as
well as handling new content. You should expect that you'll still be able to
access nightly and CI Firefox builds, but you may need to adjust your scripts
or links to do so.&lt;/p&gt;
&lt;p&gt;Currently we have most &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1100624"&gt;builds&lt;/a&gt;
and &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1117960"&gt;tests&lt;/a&gt;
doing their transfers to/from S3 via the &lt;a href="https://tools.taskcluster.net/index/artifacts/#/"&gt;task cluster index&lt;/a&gt; in
addition to doing parallel uploads to ftp.mozilla.org. We're aiming to shut off
most uploads to ftp this quarter.&lt;/p&gt;
&lt;p&gt;Please let us know if you have particular systems or use cases that rely on the
current host or directory structure!&lt;/p&gt;
&lt;h2 id="release-build-promotion"&gt;Release build promotion&lt;/h2&gt;
&lt;p&gt;Our &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1118794"&gt;new Firefox release
pipeline&lt;/a&gt; got off the
drawing board, and the initial proof-of-concept work is done.&lt;/p&gt;
&lt;p&gt;The main idea here is to take an existing build based on a push to
mozilla-beta, and to "promote" it to a release build. So we need to generate
all the l10n repacks, partner repacks, generate partial updates, publish files
to CDNs, etc.&lt;/p&gt;
&lt;p&gt;The big win here is that it cuts our time-to-release nearly in half, and also
simplifies our codebase quite a bit!&lt;/p&gt;
&lt;p&gt;Again, expect to hear more about this in the coming months.&lt;/p&gt;
&lt;h2 id="infrastructure"&gt;Infrastructure&lt;/h2&gt;
&lt;p&gt;In addition to all those projects in development, we also tackled quite a few
important infrastructure projects.&lt;/p&gt;
&lt;h3 id="osx-test-platform"&gt;OSX test platform&lt;/h3&gt;
&lt;p&gt;10.10 is now the most widely used Mac platform for Firefox, and it's important
to test what our users are running. We &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1118183"&gt;performed a rolling upgrade&lt;/a&gt;
of our OS X testing environment, migrating from 10.8 to 10.10 while spending
nearly zero capital, and with no downtime. We worked jointly with the Sheriffs
and A-Team to green up all the tests, and shut coverage off on the old platform
as we brought it up on the new one. We have a few 10.8 machines left riding the
trains that will join our 10.10 pool with the release of ESR 38.1.&lt;/p&gt;
&lt;h3 id="got-windows-builds-in-aws"&gt;Got Windows builds in AWS&lt;/h3&gt;
&lt;p&gt;We saw the first successful builds of Firefox for &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1124303"&gt;Windows in
AWS&lt;/a&gt;
this quarter as well! This paves the way for greater flexibility, on-demand
burst capacity, faster developer prototyping, and disaster recovery and
resiliency for windows Firefox builds. We'll be working on making these
virtualized instances more performant and being able to do large-scale
automation before we roll them out into production.&lt;/p&gt;
&lt;h3 id="puppet-on-windows"&gt;Puppet on windows&lt;/h3&gt;
&lt;p&gt;RelEng uses &lt;a href="https://puppetlabs.com/"&gt;puppet&lt;/a&gt; to manage our Linux and OS X
infrastructure. Presently, we use a very different tool chain, Active Directory
and Group Policy Object, to manage our Windows infrastructure. This quarter we
deployed a prototype Windows build machine which is &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1121023"&gt;managed with puppet&lt;/a&gt;
instead. Our goal here is to increase visibility and hackability of our Windows
infrastructure. A common deployment tool will also make it easier for RelEng
and community to deploy new tools to our Windows machines.&lt;/p&gt;
&lt;h3 id="new-tooltool-features"&gt;New Tooltool Features&lt;/h3&gt;
&lt;p&gt;We've &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1133842"&gt;redesigned and
deployed&lt;/a&gt; a new version
of &lt;a href="http://code.v.igoro.us/posts/2015/04/tooltool-uploads.html"&gt;tooltool&lt;/a&gt;, the
content-addressable store for large binary files used in build and test jobs.
Tooltool is now integrated with RelengAPI and uses S3 as a backing store. This
gives us scalability and a more flexible permissioning model that, in addition
to serving public files, will allow the same access outside the releng network
as inside.  That means that developers as well as external automation like
TaskCluster can use the service just like Buildbot jobs.  The new
implementation also boasts a much simpler HTTP-based upload mechanism that will
enable easier use of the service.&lt;/p&gt;
&lt;h3 id="centralized-posix-system-logging"&gt;Centralized POSIX System Logging&lt;/h3&gt;
&lt;p&gt;Using syslogd/rsyslogd and &lt;a href="https://papertrailapp.com/"&gt;Papertrail&lt;/a&gt;, we've set
up centralized system logging for all our POSIX infrastructure. Now that all
our system logs are going to one location and we can see trends across multiple
machines, we've been able to quickly identify and fix a number of previously
hard-to-discover bugs. We're planning on adding additional logs (like Windows
system logs) so we can do even greater correlation. We're also in the process
of adding more automated detection and notification of some easily recognizable
problems.&lt;/p&gt;
&lt;h3 id="security-work"&gt;Security work&lt;/h3&gt;
&lt;p&gt;Q1 included some significant effort to avoid serious security exploits like
GHOST, escalation of privilege bugs in the Linux kernel, etc. We manage 14
different operating systems, some of which are fairly esoteric and/or no longer
supported by the vendor, and we worked to backport some code and patches to
some platforms while upgrading others entirely. Because of the way our
infrastructure is architected, we were able to do this with minimal downtime or
impact to developers.&lt;/p&gt;
&lt;h3 id="api-to-manage-aws-workers"&gt;API to manage AWS workers&lt;/h3&gt;
&lt;p&gt;As part of our ongoing effort to &lt;a href="https://bugzil.la/965691"&gt;automate the loaning of releng
machines&lt;/a&gt; when required, we created an API layer to
facilitate the creation and loan of AWS resources, which was previously, and
perhaps ironically, one of the bigger time-sinks for buildduty when loaning
machines.&lt;/p&gt;
&lt;h3 id="cross-platform-worker-for-task-cluster"&gt;Cross-platform worker for task cluster&lt;/h3&gt;
&lt;p&gt;Release engineering is in the process of migrating from our stalwart,
buildbot-driven infrastructure, to a newer, more purpose-built solution in 
&lt;a href="http://docs.taskcluster.net/"&gt;taskcluster&lt;/a&gt;. Many FirefoxOS jobs have
already migrated, but those all conveniently run on Linux. In order to support
the entire range of release engineering jobs, we need support for Mac and
Windows as well. In Q1, we created what we call a "generic worker," essentially
a base class that allows us to extend taskcluster job support to non-Linux
operating systems.&lt;/p&gt;
&lt;h2 id="testing"&gt;Testing&lt;/h2&gt;
&lt;p&gt;Last, but not least, we &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1131269"&gt;deployed initial support&lt;/a&gt; for
&lt;a href="https://elvis314.wordpress.com/2015/02/06/seta-search-for-extraneous-test-automation/"&gt;SETA&lt;/a&gt;,
the search for extraneous test automation!&lt;/p&gt;
&lt;p&gt;This means we've stopped running all tests on all builds. Instead, we use
historical data to determine which tests to run that have been catching the
most regressions. Other tests are run less frequently.&lt;/p&gt;</description><category>aws</category><category>balrog</category><category>firefox</category><category>ftp</category><category>funsize</category><category>mozilla</category><category>s3</category><category>taskcluster</category><category>updates</category><guid>https://atlee.ca/posts/releng-retrospective-q1-2015/</guid><pubDate>Mon, 20 Apr 2015 11:00:00 GMT</pubDate></item><item><title>Behind the clouds: how RelEng do Firefox builds on AWS</title><link>https://atlee.ca/posts/blog20121214behind-the-clouds/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;RelEng have been expanding our usage of Amazon's AWS over the past few months as the development pace of the B2G project increases. In October we began moving builds off of Mozilla-only infrastructure and into a hybrid model where some jobs are done in Mozilla's infra, and others are done in Amazon. Since October we've &lt;a href="http://oduinn.com/blog/2012/11/27/releng-production-systems-now-in-3-aws-regions/"&gt;expanded into 3 amazon regions&lt;/a&gt;, and now have nearly 300 build machines in Amazon. Within each AWS region we've distributed our load across 3 &lt;a href="http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html"&gt;availability zones&lt;/a&gt;.


&lt;/p&gt;&lt;h3&gt;That's great! But how does it work?&lt;/h3&gt;

Behind the scenes, we've written quite a bit of code to manage our new AWS infrastructure. This code is in our cloud-tools repo (&lt;a href="https://github.com/mozilla/build-cloud-tools"&gt;github&lt;/a&gt;|&lt;a href="http://hg.mozilla.org/build/cloud-tools/"&gt;hg.m.o&lt;/a&gt;) and uses the excellent &lt;a href="https://github.com/boto/boto"&gt;boto&lt;/a&gt; library extensively.



The two work horses in there are &lt;a href="https://github.com/mozilla/build-cloud-tools/blob/master/aws/aws_watch_pending.py"&gt;aws_watch_pending&lt;/a&gt; and &lt;a href="https://github.com/mozilla/build-cloud-tools/blob/master/aws/aws_stop_idle.py"&gt;aws_stop_idle&lt;/a&gt;. &lt;a href="https://github.com/mozilla/build-cloud-tools/blob/master/aws/aws_stop_idle.py"&gt;aws_stop_idle&lt;/a&gt;'s job is pretty easy, it goes around looking at EC2 instances that are idle and shuts them off safely. If an EC2 slave hasn't done any work in more than 10 minutes, it is shut down.



&lt;a href="https://github.com/mozilla/build-cloud-tools/blob/master/aws/aws_watch_pending.py"&gt;aws_watch_pending&lt;/a&gt; is a little more involved. Its job is to notice when there are pending jobs (like your build waiting to start!) and to resume EC2 instances. We take a few factors into account when starting up instances:

&lt;ul&gt;&lt;li&gt;We wait until a pending job is more than a minute old before starting anything. This allows in-house capacity to grab the job if possible, and other EC2 slaves that are online but idle also have a chance to take it.&lt;/li&gt;
    &lt;li&gt;Use any &lt;a href="http://aws.amazon.com/ec2/reserved-instances/"&gt;reserved instances&lt;/a&gt; first. As our AWS load stabilizes, we've been able to purchase some reserved instances to reduce our cost. Obviously, to reduce our cost, we have to use those reservations wherever possible! The code to do this is a bit more complicated than I'd like it to be since AWS reservations are specific to individual availability zones rather than whole regions.&lt;/li&gt;
    &lt;li&gt;Some regions are cheaper than others, so we prefer to start instances in the cheaper regions first.&lt;/li&gt;
    &lt;li&gt;Start instances that were most recently running. This should give both better depend-build time, and also helps with billing slightly. Amazon bills for complete hours. So if you start one instance twice in an hour, you're charged for a single hour. If you start two instances once in the hour, you're charged for two hours.&lt;/li&gt;
&lt;/ul&gt;



Overall we're really happy with Amazon's services. Having APIs for nearly everything has made development &lt;em&gt;really&lt;/em&gt; easy.



&lt;h3&gt;What's next?&lt;/h3&gt;

Seeing as how test capacity is always woefully behind, we're hoping to be able to run a large number of our linux-based unittests on EC2, particularly those that don't require an accelerated graphics context.



After that? Maybe windows builds? Maybe automated regression hunting? What do you want to see?</description><category>aws</category><category>cloud</category><category>firefox</category><category>mozilla</category><guid>https://atlee.ca/posts/blog20121214behind-the-clouds/</guid><pubDate>Fri, 14 Dec 2012 23:15:49 GMT</pubDate></item><item><title>self-serve builds!</title><link>https://atlee.ca/posts/blog20110217self-serve-builds/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;Do you want to be able to cancel your own try server builds?


Do you want to be able to re-trigger a failed nightly build before the RelEng sheriff wakes up?



Do you want to be able to get additional test runs on your build?



If you answered an enthusiastic &lt;strong&gt;YES&lt;/strong&gt; to any or all of these questions, then &lt;a href="https://build.mozilla.org/buildapi/self-serve"&gt;self-serve&lt;/a&gt; is for you.



self-serve was created to provide an API to allow developers to interact with our build infrastructure, with the goal being that others would then create tools against it. It's still early days for this self-serve API, so just a few caveats:



&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;This is very much pre-alpha&lt;/strong&gt; and may cause your computer to explode, your keg to run dry, or may simply hang.&lt;/li&gt;



&lt;li&gt;It's slower than I want. I've spent a bit of time optimizing and caching, but I think it can be much better. Just look at shaver's &lt;a href="http://shaver.off.net/diary/2011/01/22/i-made-a-thing/"&gt;bugzilla search&lt;/a&gt; to see what's possible for speed. Part of the problem here is that it's currently running on a VM that's doing a few dozen other things. We're working on getting faster hardware, but didn't want to block this pre-alpha-rollout on that.&lt;/li&gt;



&lt;li&gt;You need to log in with your LDAP credentials to work with it.&lt;/li&gt;



&lt;li&gt;The HTML interface is teh suck. Good thing I'm not paid to be a front-end webdev! Really, the goal here wasn't to create a fully functional web interface, but rather to provide a functional &lt;em&gt;programmatic&lt;/em&gt; interface.&lt;/li&gt;



&lt;li&gt;Changing build priorities may run afoul of &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=555664"&gt;bug 555664&lt;/a&gt;...haven't had a chance to test out exactly what happens right now if a high priority job gets merged with a lower priority one.&lt;/li&gt;



&lt;/ul&gt;



That being said, I'm proud to be able to finally make this public. Documentation for the REST API is available as part of the web interface itself, and the code is available as part of the &lt;a href="http://hg.mozilla.org/build/buildapi"&gt;buildapi&lt;/a&gt; repository on hg.mozilla.org



&lt;a href="https://build.mozilla.org/buildapi/self-serve"&gt;https://build.mozilla.org/buildapi/self-serve&lt;/a&gt;



Please be gentle!



Any questions, problems or feedback can be left here, or filed in &lt;a href="https://bugzilla.mozilla.org/enter_bug.cgi?product=mozilla.org&amp;amp;component=Release%20Engineering"&gt;bugzilla.&lt;/a&gt;</description><category>buildbot</category><category>firefox</category><category>mozilla</category><category>python</category><category>utilities</category><category>work</category><guid>https://atlee.ca/posts/blog20110217self-serve-builds/</guid><pubDate>Thu, 17 Feb 2011 23:14:26 GMT</pubDate></item><item><title>Pooling the Talos slaves</title><link>https://atlee.ca/posts/blog20090603pooling-the-talos-slaves/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;One of the big projects for me this quarter was getting our Talos slaves configured as a pool of machines shared across branches.  The details are being tracked in &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=488367"&gt;bug 488367&lt;/a&gt; for those interested in the details.


This is a continuation of our work on pooling our slaves, like we've done over the past year with our &lt;a href="http://oduinn.com/2008/05/14/we-have-how-many-machines-dedicated-specialised-slaves-vs-pool-of-identical-slaves/"&gt;build&lt;/a&gt;, &lt;a href="http://oduinn.com/2009/03/16/unittest-and-l10n-moved-from-dedicated-specialized-slaves-to-pool-of-identical-slaves"&gt;unittest, and l10n&lt;/a&gt; slaves.



Up until now each branch has had a dedicated set of Mac Minis to run performance tests for just that branch, on five different operating systems.  For example, the Firefox 3.0 branch used to have 19 Mac Minis doing regular Talos tests: 4 of each platform (except for Leopard, which had 3).  Across our 4 active branches (Firefox 3.0, 3.5, 3.next, and TraceMonkey), we have around 80 minis in total!  That's &lt;a href="http://blog.mozilla.com/mrz/2008/06/12/i-want-to-rack-80-mac-minis/"&gt;a lot of minis&lt;/a&gt;!



What we've been working towards is to put all the Talos slaves into one pool that is shared between all our active branches.  Slaves will be given builds to test in FIFO order, regardless of which branch the build is produced on.



This new pool will be....



&lt;/p&gt;&lt;h4&gt;Faster&lt;/h4&gt;

With more slaves available to all branches, the time to wait for a free slave will go down, so testing can start more quickly...which means you get your results sooner!



&lt;h4&gt;Smarter&lt;/h4&gt;

It will be able to handle varying load between branches.  If there's a lot of activity on one branch, like on the Firefox 3.5 branch before a release, then more slaves will be available to test those builds and won't be sitting idle waiting for builds from low activity branches.



&lt;h4&gt;Scalable&lt;/h4&gt;

We will be able to scale our infrastructure much better using a pooled system.  Similar to how moving to pooled build and unittest slaves has allowed us to scale based on number of checkins rather than number of branches, having pooled Talos slaves will allow us to scale our capacity based on number of builds produced rather than the number of branches.



In the current setup, each new release or project branch required an allocation of at least 15 minis to dedicate to the branch.



Once all our Talos slaves are pooled, we will be able to add Talos support for new project or release branches with a few configuration changes instead of waiting for new minis to be provisioned.



This means we can get up and running with new project branches much more quickly!



&lt;h4&gt;More Robust&lt;/h4&gt;

We'll also be in a much better position in terms of maintenance of the machines.  When a slave goes offline, the test coverage for any one branch won't be jeopardized since we'll still have the rest of the slaves that can test builds from that branch.



In the current setup, if one or two machines of the same platform needs maintenance on one branch, then our performance test coverage of that branch is significantly impacted.  With only one or two machines remaining to run tests on that platform, it can be difficult to determine if a performance regression is caused by a code change, or is caused by some machine issue.  Losing two or three machines in this scenario is enough to close the tree, since we no longer have reliable performance data.



With pooled slaves we would see a much more gradual decrease in coverage when machines go offline.  It's the difference between losing one third of the machines on your branch, and losing one tenth.



&lt;h4&gt;When is all this going to happen?&lt;/h4&gt;

Some of it has started already!  We have a &lt;strong&gt;small&lt;/strong&gt; pool of slaves testing builds from our four branches right now.  If you know how to coerce Tinderbox to show you hidden columns, you can take a look for yourself.  They're also reporting to the &lt;a href="http://graphs-new.mozilla.org"&gt;new graph server&lt;/a&gt; using machines names starting with 'talos-rev2'.



We have some new minis waiting to be added to the pool.  Together with existing slaves, this will give us around 25 machines in total to start off the new pool.  This isn't enough yet to be able to test every build from each branch without skipping any, so for the moment the pool will be skipping to the most recent build per branch if there's any backlog.



It's worth pointing out that our current Talos system also skips builds if there's any backlog.  However, our goal is to turn off skipping once we have enough slaves in the pool to handle our peak loads comfortably.



After this initial batch is up and running, we'll be waiting for a suitable time to start moving the existing Talos slaves into the pool.



All in all, this should be a big win for everyone!</description><category>buildbot</category><category>end-to-end</category><category>firefox</category><category>make-stuff-fast</category><category>mozilla</category><category>performance</category><category>talos</category><category>technology</category><category>work</category><guid>https://atlee.ca/posts/blog20090603pooling-the-talos-slaves/</guid><pubDate>Wed, 03 Jun 2009 04:00:42 GMT</pubDate></item><item><title>Parallelizing Unit Tests</title><link>https://atlee.ca/posts/blog20090515parallelizing-unittests/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;Last week we flipped the switch and turned on running unit tests on packaged builds for our &lt;a href="http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox3.5-Unittest"&gt;mozilla-1.9.1&lt;/a&gt;, &lt;a href="http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox-Unittest"&gt;mozilla-central&lt;/a&gt;, and &lt;a href="http://tinderbox.mozilla.org/showbuilds.cgi?tree=TraceMonkey-Unittest"&gt;tracemonkey&lt;/a&gt; branches.


What this means is that our current unit test builds are uploaded to a web server along with all their unit tests. Another machine will then download the build and tests, and run various test suites on them.



Splitting up the tests this way allows us to run the test suites in parallel, so the mochitest suite will run on one machine, and all the other suites will be run on another machine (this group of tests is creatively named 'everythingelse' on Tinderbox).



&lt;img src="https://atlee.ca/blog/wp-content/uploads/paralleltests.png" alt="paralleltests" title="paralleltests" width="462" height="627" class="size-full wp-image-397"&gt;



Splitting up the tests is a critical step towards &lt;strong&gt;reducing our end-to-end time&lt;/strong&gt;, which is the total time elapsed between when a change is pushed into one of the source repositories, and when all of the results from that build are available. Up until now, you had to wait for all the test suites to be completed in sequence, which could take over an hour in total. Now that we can split the tests up, the wait time is determined by the longest test suite.  The mochitest suite is currently the biggest chunk here, taking somewhere around 35 minutes to complete, and all of the other tests combined take around 20 minutes. One of the next steps for us to do is to look at splitting up the mochitests into smaller pieces.



For the time being, we will continue to run the existing unit tests on the same machine that is creating the build. This is so that we can make sure that running tests on the packaged builds is giving us the same results (there are already some known differences: &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=491675"&gt;bug 491675&lt;/a&gt;, &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=475383"&gt;bug 475383&lt;/a&gt;)



Parallelizing the unit tests, and the infrastructure required to run them, is the first step towards achieving a few important goals.



- Reducing end-to-end time.



- Running unit tests on debug, as well as on optimized builds. Once we've got both of these going, we can turn off the builds that are currently done solely to be able to run tests on them.



- Running unit tests on the same build multiple times, to help isolate intermittent test failures.



All of the gory details can be found in &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=383136"&gt;bug 383136&lt;/a&gt;.&lt;/p&gt;</description><category>end-to-end</category><category>firefox</category><category>mozilla</category><category>parallel</category><category>performance</category><category>technology</category><category>unittests</category><category>work</category><guid>https://atlee.ca/posts/blog20090515parallelizing-unittests/</guid><pubDate>Fri, 15 May 2009 21:59:20 GMT</pubDate></item><item><title>Give Firefox 3.1 Beta 3 a try!</title><link>https://atlee.ca/posts/blog20090313give-firefox-31-beta-3-a-try/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;We just released the third beta of Firefox 3.1.  Read more about it over at &lt;a href="https://developer.mozilla.org/devnews/index.php/2009/03/12/firefox-31-beta-3-now-available-for-download/"&gt;Mozilla Developer Center&lt;/a&gt;, or if you're impatient, go &lt;a href="http://www.mozilla.com/en-US/firefox/all-beta.html"&gt;download it now&lt;/a&gt;!&lt;/p&gt;</description><category>beta</category><category>firefox</category><category>mozilla</category><category>technology</category><guid>https://atlee.ca/posts/blog20090313give-firefox-31-beta-3-a-try/</guid><pubDate>Fri, 13 Mar 2009 12:58:01 GMT</pubDate></item><item><title>I've been vimperated!</title><link>https://atlee.ca/posts/blog20080616ive-been-vimperated/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;Thanks to &lt;a href="http://vimperator.mozdev.org/"&gt;vimperator&lt;/a&gt;, I've been liberated from non-vi keybindings in firefox!


In Debian, it's just a quick

&lt;code&gt;apt-get install iceweasel-vimperator&lt;/code&gt;

away!&lt;/p&gt;</description><category>debian</category><category>firefox</category><category>linux</category><category>technology</category><category>vim</category><guid>https://atlee.ca/posts/blog20080616ive-been-vimperated/</guid><pubDate>Mon, 16 Jun 2008 18:00:46 GMT</pubDate></item></channel></rss>