<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>chris' random ramblings (Posts about aws)</title><link>https://atlee.ca/</link><description></description><atom:link href="https://atlee.ca/categories/aws.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Sat, 22 Feb 2025 20:04:32 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>RelEng Retrospective - Q1 2015</title><link>https://atlee.ca/posts/releng-retrospective-q1-2015/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;&lt;a href="https://wiki.mozilla.org/ReleaseEngineering"&gt;RelEng&lt;/a&gt; had a great start to
2015. We hit some major milestones on projects like Balrog and were able to turn
off some old legacy systems, which is always an extremely satisfying thing to do!&lt;/p&gt;
&lt;p&gt;We also made some exciting new changes to the underlying infrastructure, got
some projects off the drawing board and into production, and drastically
reduced our test load!&lt;/p&gt;
&lt;h2 id="firefox-updates"&gt;Firefox updates&lt;/h2&gt;
&lt;h3 id="balrog"&gt;&lt;a href="https://wiki.mozilla.org/Balrog"&gt;Balrog&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://werewolfnightmare101.deviantart.com/art/Balrog-Drawing-254795366"&gt;&lt;img alt="balrog" src="https://atlee.ca/posts/releng-retrospective-q1-2015/balrog.png"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All Firefox update queries are now being served by Balrog!  Earlier this year,
we switched all Firefox update queries off of the old update server,
aus3.mozilla.org, to the new update server, codenamed
&lt;a href="https://wiki.mozilla.org/Balrog"&gt;Balrog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Already, Balrog has enabled us to be much more flexible in handling updates
than the previous system. As an example, in &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1150021"&gt;bug
1150021&lt;/a&gt;, the About
Firefox dialog was broken in the Beta version of Firefox 38 for users with RTL
locales. Once the problem was discovered, we were able to quickly disable
updates just for those users until a fix was ready. With the previous system it
would have taken many hours of specialized manual work to disable the updates
for just these locales, and to make sure they didn't get updates for subsequent
Betas.&lt;/p&gt;
&lt;p&gt;Once we were confident that Balrog was able to handle all previous traffic, we
&lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1117962"&gt;shut down the old update server (aus3)&lt;/a&gt;.
aus3 was also one of the last systems relying on CVS (!! I know, rite?). It's a
great feeling to be one step closer to axing one more old system!&lt;/p&gt;
&lt;h3 id="funsize"&gt;&lt;a href="https://wiki.mozilla.org/ReleaseEngineering/Funsize"&gt;Funsize&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;When we started the quarter, we had an exciting new plan for generating partial
updates for Firefox in a scalable way.&lt;/p&gt;
&lt;p&gt;Then we threw out that plan and came up with an EVEN MOAR BETTER plan!&lt;/p&gt;
&lt;p&gt;The &lt;a href="http://rail.merail.ca/posts/taskcluster-first-impression.html"&gt;new architecture&lt;/a&gt;
for funsize relies on &lt;a href="https://pulse.mozilla.org/"&gt;Pulse&lt;/a&gt; for notifications
about new nightly builds
that need partial updates, and uses &lt;a href="http://docs.taskcluster.net/"&gt;TaskCluster&lt;/a&gt;
for doing the generation of the partials and publishing to Balrog.&lt;/p&gt;
&lt;p&gt;The current status of funsize is that we're using it to &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1118015"&gt;generate partial
updates for nightly builds&lt;/a&gt;,
but not published to the regular nightly update channel yet.&lt;/p&gt;
&lt;p&gt;There's lots more to say here...stay tuned!&lt;/p&gt;
&lt;h2 id="ftp-s3"&gt;FTP &amp;amp; S3&lt;/h2&gt;
&lt;p&gt;Brace yourselves... &lt;a href="http://ftp.mozilla.org/pub/mozilla.org/"&gt;ftp.mozilla.org&lt;/a&gt;
is going away...&lt;/p&gt;
&lt;p&gt;&lt;img alt="brace yourselves...ftp is going away" src="https://atlee.ca/posts/releng-retrospective-q1-2015/61319299.jpg"&gt;&lt;/p&gt;
&lt;p&gt;...in its current incarnation at least.&lt;/p&gt;
&lt;p&gt;Expect to hear MUCH more about this in the coming months.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt; is that we're migrating as much of the Firefox build/test/release
automation to S3 as possible.&lt;/p&gt;
&lt;p&gt;The existing machinery behind ftp.mozilla.org will be going away near the end of Q3. We
have some ideas of how we're going to handle migrating existing content, as
well as handling new content. You should expect that you'll still be able to
access nightly and CI Firefox builds, but you may need to adjust your scripts
or links to do so.&lt;/p&gt;
&lt;p&gt;Currently we have most &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1100624"&gt;builds&lt;/a&gt;
and &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1117960"&gt;tests&lt;/a&gt;
doing their transfers to/from S3 via the &lt;a href="https://tools.taskcluster.net/index/artifacts/#/"&gt;task cluster index&lt;/a&gt; in
addition to doing parallel uploads to ftp.mozilla.org. We're aiming to shut off
most uploads to ftp this quarter.&lt;/p&gt;
&lt;p&gt;Please let us know if you have particular systems or use cases that rely on the
current host or directory structure!&lt;/p&gt;
&lt;h2 id="release-build-promotion"&gt;Release build promotion&lt;/h2&gt;
&lt;p&gt;Our &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1118794"&gt;new Firefox release
pipeline&lt;/a&gt; got off the
drawing board, and the initial proof-of-concept work is done.&lt;/p&gt;
&lt;p&gt;The main idea here is to take an existing build based on a push to
mozilla-beta, and to "promote" it to a release build. So we need to generate
all the l10n repacks, partner repacks, generate partial updates, publish files
to CDNs, etc.&lt;/p&gt;
&lt;p&gt;The big win here is that it cuts our time-to-release nearly in half, and also
simplifies our codebase quite a bit!&lt;/p&gt;
&lt;p&gt;Again, expect to hear more about this in the coming months.&lt;/p&gt;
&lt;h2 id="infrastructure"&gt;Infrastructure&lt;/h2&gt;
&lt;p&gt;In addition to all those projects in development, we also tackled quite a few
important infrastructure projects.&lt;/p&gt;
&lt;h3 id="osx-test-platform"&gt;OSX test platform&lt;/h3&gt;
&lt;p&gt;10.10 is now the most widely used Mac platform for Firefox, and it's important
to test what our users are running. We &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1118183"&gt;performed a rolling upgrade&lt;/a&gt;
of our OS X testing environment, migrating from 10.8 to 10.10 while spending
nearly zero capital, and with no downtime. We worked jointly with the Sheriffs
and A-Team to green up all the tests, and shut coverage off on the old platform
as we brought it up on the new one. We have a few 10.8 machines left riding the
trains that will join our 10.10 pool with the release of ESR 38.1.&lt;/p&gt;
&lt;h3 id="got-windows-builds-in-aws"&gt;Got Windows builds in AWS&lt;/h3&gt;
&lt;p&gt;We saw the first successful builds of Firefox for &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1124303"&gt;Windows in
AWS&lt;/a&gt;
this quarter as well! This paves the way for greater flexibility, on-demand
burst capacity, faster developer prototyping, and disaster recovery and
resiliency for windows Firefox builds. We'll be working on making these
virtualized instances more performant and being able to do large-scale
automation before we roll them out into production.&lt;/p&gt;
&lt;h3 id="puppet-on-windows"&gt;Puppet on windows&lt;/h3&gt;
&lt;p&gt;RelEng uses &lt;a href="https://puppetlabs.com/"&gt;puppet&lt;/a&gt; to manage our Linux and OS X
infrastructure. Presently, we use a very different tool chain, Active Directory
and Group Policy Object, to manage our Windows infrastructure. This quarter we
deployed a prototype Windows build machine which is &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1121023"&gt;managed with puppet&lt;/a&gt;
instead. Our goal here is to increase visibility and hackability of our Windows
infrastructure. A common deployment tool will also make it easier for RelEng
and community to deploy new tools to our Windows machines.&lt;/p&gt;
&lt;h3 id="new-tooltool-features"&gt;New Tooltool Features&lt;/h3&gt;
&lt;p&gt;We've &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1133842"&gt;redesigned and
deployed&lt;/a&gt; a new version
of &lt;a href="http://code.v.igoro.us/posts/2015/04/tooltool-uploads.html"&gt;tooltool&lt;/a&gt;, the
content-addressable store for large binary files used in build and test jobs.
Tooltool is now integrated with RelengAPI and uses S3 as a backing store. This
gives us scalability and a more flexible permissioning model that, in addition
to serving public files, will allow the same access outside the releng network
as inside.  That means that developers as well as external automation like
TaskCluster can use the service just like Buildbot jobs.  The new
implementation also boasts a much simpler HTTP-based upload mechanism that will
enable easier use of the service.&lt;/p&gt;
&lt;h3 id="centralized-posix-system-logging"&gt;Centralized POSIX System Logging&lt;/h3&gt;
&lt;p&gt;Using syslogd/rsyslogd and &lt;a href="https://papertrailapp.com/"&gt;Papertrail&lt;/a&gt;, we've set
up centralized system logging for all our POSIX infrastructure. Now that all
our system logs are going to one location and we can see trends across multiple
machines, we've been able to quickly identify and fix a number of previously
hard-to-discover bugs. We're planning on adding additional logs (like Windows
system logs) so we can do even greater correlation. We're also in the process
of adding more automated detection and notification of some easily recognizable
problems.&lt;/p&gt;
&lt;h3 id="security-work"&gt;Security work&lt;/h3&gt;
&lt;p&gt;Q1 included some significant effort to avoid serious security exploits like
GHOST, escalation of privilege bugs in the Linux kernel, etc. We manage 14
different operating systems, some of which are fairly esoteric and/or no longer
supported by the vendor, and we worked to backport some code and patches to
some platforms while upgrading others entirely. Because of the way our
infrastructure is architected, we were able to do this with minimal downtime or
impact to developers.&lt;/p&gt;
&lt;h3 id="api-to-manage-aws-workers"&gt;API to manage AWS workers&lt;/h3&gt;
&lt;p&gt;As part of our ongoing effort to &lt;a href="https://bugzil.la/965691"&gt;automate the loaning of releng
machines&lt;/a&gt; when required, we created an API layer to
facilitate the creation and loan of AWS resources, which was previously, and
perhaps ironically, one of the bigger time-sinks for buildduty when loaning
machines.&lt;/p&gt;
&lt;h3 id="cross-platform-worker-for-task-cluster"&gt;Cross-platform worker for task cluster&lt;/h3&gt;
&lt;p&gt;Release engineering is in the process of migrating from our stalwart,
buildbot-driven infrastructure, to a newer, more purpose-built solution in 
&lt;a href="http://docs.taskcluster.net/"&gt;taskcluster&lt;/a&gt;. Many FirefoxOS jobs have
already migrated, but those all conveniently run on Linux. In order to support
the entire range of release engineering jobs, we need support for Mac and
Windows as well. In Q1, we created what we call a "generic worker," essentially
a base class that allows us to extend taskcluster job support to non-Linux
operating systems.&lt;/p&gt;
&lt;h2 id="testing"&gt;Testing&lt;/h2&gt;
&lt;p&gt;Last, but not least, we &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1131269"&gt;deployed initial support&lt;/a&gt; for
&lt;a href="https://elvis314.wordpress.com/2015/02/06/seta-search-for-extraneous-test-automation/"&gt;SETA&lt;/a&gt;,
the search for extraneous test automation!&lt;/p&gt;
&lt;p&gt;This means we've stopped running all tests on all builds. Instead, we use
historical data to determine which tests to run that have been catching the
most regressions. Other tests are run less frequently.&lt;/p&gt;</description><category>aws</category><category>balrog</category><category>firefox</category><category>ftp</category><category>funsize</category><category>mozilla</category><category>s3</category><category>taskcluster</category><category>updates</category><guid>https://atlee.ca/posts/releng-retrospective-q1-2015/</guid><pubDate>Mon, 20 Apr 2015 11:00:00 GMT</pubDate></item><item><title>Gotta Cache 'Em All</title><link>https://atlee.ca/posts/cache-em-all/</link><dc:creator>chris</dc:creator><description>&lt;section id="too-much-traffic"&gt;
&lt;h2&gt;TOO MUCH TRAFFIC!!!!&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://atlee.ca/blog/posts/aws-networks-and-burning-trees.html"&gt;Waaaaaaay back in February&lt;/a&gt; we identified overall network bandwidth as a
cause of job failures on &lt;a class="reference external" href="https://tbpl.mozilla.org"&gt;TBPL&lt;/a&gt;. We were pushing too much traffic over our
VPN link between Mozilla's datacentre and AWS.  Since then we've been
working on a few approaches to cope with the increased traffic while at the
same time reducing our overall network load.  Most recently we've deployed
HTTP caches inside each AWS region.&lt;/p&gt;
&lt;img alt="Network traffic from January to August 2014" class="align-center" src="https://atlee.ca/posts/cache-em-all/releng-traffic-2014.png"&gt;
&lt;/section&gt;
&lt;section id="the-answer-cache-all-the-things"&gt;
&lt;h2&gt;The answer - cache all the things!&lt;/h2&gt;
&lt;a class="reference external image-reference" href="http://xkcd.com/908/"&gt;&lt;img alt="Obligatory XKCD" class="align-center" src="http://imgs.xkcd.com/comics/the_cloud.png"&gt;&lt;/a&gt;
&lt;section id="caching-build-artifacts"&gt;
&lt;h3&gt;Caching build artifacts&lt;/h3&gt;
&lt;p&gt;The primary target for caching was downloads of build/test/symbol packages
by test machines from file servers. These packages are generated by the
build machines and uploaded to various file servers. The same packages are
then downloaded many times by different machines running tests. This was a
perfect candidate for caching, since the same files were being requested by
many different hosts in a relatively short timespan.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="caching-tooltool-downloads"&gt;
&lt;h3&gt;Caching tooltool downloads&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://wiki.mozilla.org/ReleaseEngineering/Applications/Tooltool"&gt;Tooltool&lt;/a&gt; is a simple system RelEng uses to distribute static assets to
build/test machines. While the machines do maintain a local cache of files,
the caches are often empty because the machines are newly created in AWS.
Having the files in local HTTP caches speeds up transfer times and
decreases network load.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="results-so-far-50-decrease-in-bandwidth"&gt;
&lt;h2&gt;Results so far - 50% decrease in bandwidth&lt;/h2&gt;
&lt;p&gt;Initial deployment was completed on August 8th (end of week 32 of 2014).
You can see by the graph above that we've cut our bandwidth by about 50%!&lt;/p&gt;
&lt;/section&gt;
&lt;section id="what-s-next"&gt;
&lt;h2&gt;What's next?&lt;/h2&gt;
&lt;p&gt;There are a few more low hanging fruit for caching. We have internal pypi
repositories that could benefit from caches. There's a long tail of other
miscellaneous downloads that could be cached as well.&lt;/p&gt;
&lt;p&gt;There are other improvements we can make to reduce bandwidth as well, such
as moving uploads from build machines to be outside the VPN tunnel, or
perhaps to S3 directly. Additionally, a big source of network traffic is
doing signing of various packages (gpg signatures, MAR files, etc.). We're
looking at ways to do that more efficiently. I'd love to investigate more
efficient ways of compressing or transferring build artifacts overall;
there is a ton of duplication between the build and test packages between
different platforms and even between different pushes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="i-want-to-know-moar"&gt;
&lt;h2&gt;I want to know MOAR!&lt;/h2&gt;
&lt;p&gt;Great! As always, all our work has been tracked in a bug, and worked out in
the open. The bug for this project is &lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1017759"&gt;1017759&lt;/a&gt;. The source code lives in
&lt;a class="reference external" href="https://github.com/mozilla/build-proxxy/"&gt;https://github.com/mozilla/build-proxxy/&lt;/a&gt;, and we have some basic
documentation available on our &lt;a class="reference external" href="https://wiki.mozilla.org/ReleaseEngineering/Applications/Proxxy"&gt;wiki&lt;/a&gt;. If this kind of work excites you,
&lt;a class="reference external" href="https://careers.mozilla.org/en-US/position/ohz2YfwA"&gt;we're hiring!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Big thanks to &lt;a class="reference external" href="https://github.com/laggyluke"&gt;George Miroshnykov&lt;/a&gt; for his work on developing proxxy.&lt;/p&gt;
&lt;/section&gt;</description><category>aws</category><category>build</category><category>cloud</category><category>graph</category><category>make-stuff-fast</category><category>mozilla</category><category>performance</category><guid>https://atlee.ca/posts/cache-em-all/</guid><pubDate>Tue, 26 Aug 2014 14:21:00 GMT</pubDate></item><item><title>Blobber is live - upload ALL the things!</title><link>https://atlee.ca/posts/blobber-is-live/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;Last week without any fanfare, we closed &lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=749421"&gt;bug 749421&lt;/a&gt; - Allow test slaves to
save and upload files somewhere. This has actually been working well for a few
months now, it's just taken a while to close it out properly, and I completely
failed to announce it anywhere. mea culpa!&lt;/p&gt;
&lt;p&gt;This was a really important project, and deserves some fanfare! &lt;em&gt;cue trumpets, parades and skywriters&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The goal of this project was to make it easier for developers to get important
data out of the test machines reporting to &lt;a class="reference external" href="http://tbpl.mozilla.org"&gt;TBPL&lt;/a&gt;. Previously the only real
output from a test job was the textual log.  That meant if you wanted a
screen shot from a failing test, or the dump from a crashing process, you needed
to format it somehow into the log. For screen shots we would base64 encode a png
image and print it to the log as a data URL!&lt;/p&gt;
&lt;p&gt;With blobber successfully running now, it's now possible to upload extra files
from your test runs on &lt;a class="reference external" href="http://tbpl.mozilla.org"&gt;TBPL&lt;/a&gt;. Things like screen shots, minidump logs and zip
files are already supported.&lt;/p&gt;
&lt;p&gt;Getting new files uploaded is pretty straightforward. If the environment
variable &lt;cite&gt;MOZ_UPLOAD_DIR&lt;/cite&gt; is set in your test's environment, you can simply
copy files there and they will be uploaded after the test run is complete.
Links to the files are output in the log. e.g.&lt;/p&gt;
&lt;pre class="literal-block"&gt;15:21:18     INFO -  (blobuploader) - INFO - TinderboxPrint: Uploaded 70485077-b08a-4530-8d4b-c85b0d6f9bc7.dmp to http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/5778e0be8288fe8c91ab69dd9c2b4fbcc00d0ccad4d3a8bd78d3abe681af13c664bd7c57705822a5585655e96ebd999b0649d7b5049fee1bd75a410ae6ee55af&lt;/pre&gt;
&lt;p&gt;Your thanks and praise should go to our awesome intern, &lt;a class="reference external" href="https://mozillians.org/u/mtabara"&gt;Mihai Tabara&lt;/a&gt;, who
did most of the work here.&lt;/p&gt;
&lt;p&gt;Most test jobs are already supported; if you're unsure if the job type you're
interested is supported, just search for &lt;cite&gt;MOZ_UPLOAD_DIR&lt;/cite&gt; in the log on tbpl.
If it's not there and you need it, please &lt;a class="reference external" href="https://bugzilla.mozilla.org/enter_bug.cgi?product=Release%20Engineering&amp;amp;component=General%20Automation"&gt;file a bug!&lt;/a&gt;&lt;/p&gt;</description><category>aws</category><category>mozilla</category><guid>https://atlee.ca/posts/blobber-is-live/</guid><pubDate>Mon, 20 Jan 2014 22:06:35 GMT</pubDate></item><item><title>Behind the clouds: how RelEng do Firefox builds on AWS</title><link>https://atlee.ca/posts/blog20121214behind-the-clouds/</link><dc:creator>chris</dc:creator><description>&lt;p&gt;RelEng have been expanding our usage of Amazon's AWS over the past few months as the development pace of the B2G project increases. In October we began moving builds off of Mozilla-only infrastructure and into a hybrid model where some jobs are done in Mozilla's infra, and others are done in Amazon. Since October we've &lt;a href="http://oduinn.com/blog/2012/11/27/releng-production-systems-now-in-3-aws-regions/"&gt;expanded into 3 amazon regions&lt;/a&gt;, and now have nearly 300 build machines in Amazon. Within each AWS region we've distributed our load across 3 &lt;a href="http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html"&gt;availability zones&lt;/a&gt;.


&lt;/p&gt;&lt;h3&gt;That's great! But how does it work?&lt;/h3&gt;

Behind the scenes, we've written quite a bit of code to manage our new AWS infrastructure. This code is in our cloud-tools repo (&lt;a href="https://github.com/mozilla/build-cloud-tools"&gt;github&lt;/a&gt;|&lt;a href="http://hg.mozilla.org/build/cloud-tools/"&gt;hg.m.o&lt;/a&gt;) and uses the excellent &lt;a href="https://github.com/boto/boto"&gt;boto&lt;/a&gt; library extensively.



The two work horses in there are &lt;a href="https://github.com/mozilla/build-cloud-tools/blob/master/aws/aws_watch_pending.py"&gt;aws_watch_pending&lt;/a&gt; and &lt;a href="https://github.com/mozilla/build-cloud-tools/blob/master/aws/aws_stop_idle.py"&gt;aws_stop_idle&lt;/a&gt;. &lt;a href="https://github.com/mozilla/build-cloud-tools/blob/master/aws/aws_stop_idle.py"&gt;aws_stop_idle&lt;/a&gt;'s job is pretty easy, it goes around looking at EC2 instances that are idle and shuts them off safely. If an EC2 slave hasn't done any work in more than 10 minutes, it is shut down.



&lt;a href="https://github.com/mozilla/build-cloud-tools/blob/master/aws/aws_watch_pending.py"&gt;aws_watch_pending&lt;/a&gt; is a little more involved. Its job is to notice when there are pending jobs (like your build waiting to start!) and to resume EC2 instances. We take a few factors into account when starting up instances:

&lt;ul&gt;&lt;li&gt;We wait until a pending job is more than a minute old before starting anything. This allows in-house capacity to grab the job if possible, and other EC2 slaves that are online but idle also have a chance to take it.&lt;/li&gt;
    &lt;li&gt;Use any &lt;a href="http://aws.amazon.com/ec2/reserved-instances/"&gt;reserved instances&lt;/a&gt; first. As our AWS load stabilizes, we've been able to purchase some reserved instances to reduce our cost. Obviously, to reduce our cost, we have to use those reservations wherever possible! The code to do this is a bit more complicated than I'd like it to be since AWS reservations are specific to individual availability zones rather than whole regions.&lt;/li&gt;
    &lt;li&gt;Some regions are cheaper than others, so we prefer to start instances in the cheaper regions first.&lt;/li&gt;
    &lt;li&gt;Start instances that were most recently running. This should give both better depend-build time, and also helps with billing slightly. Amazon bills for complete hours. So if you start one instance twice in an hour, you're charged for a single hour. If you start two instances once in the hour, you're charged for two hours.&lt;/li&gt;
&lt;/ul&gt;



Overall we're really happy with Amazon's services. Having APIs for nearly everything has made development &lt;em&gt;really&lt;/em&gt; easy.



&lt;h3&gt;What's next?&lt;/h3&gt;

Seeing as how test capacity is always woefully behind, we're hoping to be able to run a large number of our linux-based unittests on EC2, particularly those that don't require an accelerated graphics context.



After that? Maybe windows builds? Maybe automated regression hunting? What do you want to see?</description><category>aws</category><category>cloud</category><category>firefox</category><category>mozilla</category><guid>https://atlee.ca/posts/blog20121214behind-the-clouds/</guid><pubDate>Fri, 14 Dec 2012 23:15:49 GMT</pubDate></item></channel></rss>