A few weeks ago I finished reading The Brendan Voyage by Tim Severin. I actually stayed up late into the night to finish, I really couldn't put the book down.
It's an amazing story of Tim Severin's recreation of the 6th century voyage of St. Brendan from Ireland to North America. Yes, 6th century! The tale of the original voyage is told in the Voyage of Saint Brendan, which dates back to around 900AD.
The book begins with the author's quest to re-create the original boat as closely as possible to what would have been available to a 6th century Irish monk, also following details recorded in the original text. It turns out that this means a boat made out of wood with a leather hull. This is followed by much discussion and research into how well a leather boat could possibly survive years at sea, if at all! I found it really fascinating how Severin finds over and over again how traditional methods and materials actually perform quite well in the harsh environment of the North Atlantic. In fact, over the course of his voyage much of his more modern equipment breaks down with the constant exposure to salt water, where the traditional materials fare quite well.
Severin's planned voyage begins from the coast of Ireland, then continues north past Scotland, passing by the Faroe Islands on the way to Iceland. I found the passages describing the life of people living on these remote islands fascinating. It seems impossible that 6th century Irish monks had sought out and established monasteries on many of these lonely pillars of stone in the sea.
Tim Severin makes a really convincing case that not only was Saint Brendan's voyage possible, but that he wasn't the first Irish sailor to visit Iceland, Greenland, and possibly even North America.
I gave it 5 stars on goodreads. Highly, highly recommended book.
I was getting tired of staying on the wordpress update treadmill, and the latest attack made me worry that my site was vulnerable.
Static sites have the advantage of having a much smaller footprint to attack, and also much easier to understand and control, so I thought I'd give it a try. This blog is now being created with Nikola!
jcranmer noticed a funny new colour on TBPL last week - hot pink!
This colour indicates jobs that have been cancelled manually. It's an important distinction to make between these jobs and jobs that have failed due to build or test or infrastructure problems.
It took a long time, but I finally finished deploying bug 704006 - add a new status for "user cancelled" last week.
With the help of fabric, I was able to script the upgrade so that we didn't require a downtime. Each master was first disabled in slavealloc so that slaves would connect to different masters after a reboot. Next I gracefully shut down the master. Once the master was shut down cleanly, buildbot was updated to pick up the new change, and the master started back up again. Each master could take several hours to shut down depending on what jobs were running, and we have quite a few masters now, so this whole process took most of the week to complete. It's awesome to be able to do this kind of thing without a downtime though....it makes me feel a bit like a ninja :P
Big thanks to ewong for the initial patch, and making sure it got accepted upstream as well!
I've been working on git support for Mozilla's build infrastructure over the past while.
RelEng's current model for creating our build machines is to create a homogeneous set of machines, each one capable of doing any type of build. This means that build machines can be doing a debug mozilla-central linux64 build one minute, and then an opt mozilla-b2g18 panda build the next. This model has worked well for us for scalability and load balancing: machines run on branches where there's currently activity, and we can focus our efforts on overall capacity. It's a huge improvement over the Firefox 2.x and 3.0 days where we had only one or two dedicated machines per branch!
However, there are challenges when using this model. We need to be reasonably efficient with disk space since the same disk is shared between all types of builds across all branches. We also need to make build setup time as quick as possible.
For mercurial based builds we've developed a utility called hgtool that manages shared local copies of repositories that are then locally cloned into each specific build directory. By using hg's share extension, we are able to save quite a bit of disk space and also have efficient pulling/cloning operations. However, because of the way Mozilla treats branches in hg (where each branch is in a separate repo), we need to keep each local clone of gecko branches separate from each other. We have a separate local clone of mozilla-central and another of mozilla-inbound. (although, maybe we could use bookmarks to keep them distinct in the same local repo...hmmm...)
What about git?
git's branches are quite different from mercurial's. Everything depends on "refs", which are names that refer to specific commit ids. It's actually possible to pull in completely unrelated git repositories into the same local repository. As long as you keep the refs from different repositories separate, all the commits and metadata coexist happily. In contrast, mercurial uses metadata on each commit to distinguish branches from each other. Most commits in the various gecko repositories are on the "default" branch. This makes it harder to combine mozilla-central and mozilla-inbound into the same repository without using some other means to keep each repository's "default" branch separate.
This gave me a crazy idea: what if we kept one giant shared git repository on the build machines that contained ALL THE REPOS.
We have a script called gittool that is the analog of hgtool. I modified it to support b2g-style xml manifests, and pointed it at a local shared repository and let it loose. It works!
I noticed that the more repositories were cloned into the shared repository, the longer it took to clone new ones. Repositories that normally take 5 seconds to clone were now taking 5 minutes to fetch the first time into the shared repository. What was going on?
One clue was that there are over 144,000 refs across all the b2g repositories. The vast majority of these are tags...there are only 12k non-tag refs.
I created a test script to create new repositories and measure how long it took to fetch them into a shared repository. The most significant factor affecting fetch speed was the number of refs! The new repositories each had 40 commits, and I created 100 of them. If I created one tag per commit then fetching a new repository into the shared repo added 0.0028s. However if I created 10 tags per commit then fetching the new repositories into the shared repo added 0.0113s per repo.
For now I've given up on the idea to have giant local shared repositories, and have moved to a model that keeps each repository separate, like we do with hg. Hopefully we can revisit this later to avoid having multiple copies of related repositories cloned on the same machine.
You may have noticed some new tests appearing on TBPL this week:
The Ubuntu32/64 test platforms have been enabled on most branches this week. Wherever we've had consistent green results on the new test platforms, we've disabled running those tests on the older Fedora32/64 platforms. Currently these are crashtests, jsreftests and marionette tests. We're working closely with the awesome A*Team to migrate over any remaining test suites that make sense. As always, if you notice anything that looks like it's not working right, please let us know - filing a bug is the best way. We expect there to be differences between the test platforms and therefore in some test results. We're committed to tracking down what those differences are so we can make sure the new test machines continue to give us good test results.
A lot of what we do in RelEng flies under the radar. When we're doing our jobs well, most of the time you shouldn't notice changes we make!
I wanted to highlight this change in particular, because it's a HUGE win for test scalability. If needed, we're able to add more Ubuntu test machines in a matter of minutes. And the more tests we can move over to this new pool of test machines, the more we can improve turnaround time on the overloaded Fedora slaves.
Rail deserves most of the credit for this awesome work, so send kudos and/or beer his way :)
RelEng have been expanding our usage of Amazon's AWS over the past few months as the development pace of the B2G project increases. In October we began moving builds off of Mozilla-only infrastructure and into a hybrid model where some jobs are done in Mozilla's infra, and others are done in Amazon. Since October we've expanded into 3 amazon regions, and now have nearly 300 build machines in Amazon. Within each AWS region we've distributed our load across 3 availability zones.
The two work horses in there are aws_watch_pending and aws_stop_idle. aws_stop_idle's job is pretty easy, it goes around looking at EC2 instances that are idle and shuts them off safely. If an EC2 slave hasn't done any work in more than 10 minutes, it is shut down.
aws_watch_pending is a little more involved. Its job is to notice when there are pending jobs (like your build waiting to start!) and to resume EC2 instances. We take a few factors into account when starting up instances:
Overall we're really happy with Amazon's services. Having APIs for nearly everything has made development really easy.
Seeing as how test capacity is always woefully behind, we're hoping to be able to run a large number of our linux-based unittests on EC2, particularly those that don't require an accelerated graphics context.
After that? Maybe windows builds? Maybe automated regression hunting? What do you want to see?
Mozilla's Release Engineering team is responsible for making sure that our products get built, released and updated properly.
If you're working on a project which changes how or what we ship, or requires changes in how updates work, please get in touch with us as early in your project as possible. We can work with you to find the best solutions to your release/update problems, and save everybody a lot of last minute panic.
Did you know that since last Thursday we've been doing the majority of our Linux Firefox builds in Amazon?
John Hopkins posted about this last week, but I wanted to highlight how important this is for Release Engineering. We can now scale up the number of Linux build machines in proportion to load. If there are no builds happening, great! Shut off the machines! If there are builds pending, we can start up more machines within minutes.
Migrating to these new build systems means that we can now convert excess in-house Linux build capacity into additional Windows build capacity.
Very shortly we'll be looking at running certain unit test suites in this environment as well.
For all the gory details, follow along in bug 772446.
In order to remedy this, I present to you the Try High Scores list!
This is a report of how much time people have racked up on try server over the last 7 days. This isn't meant to be a wall of shame (or fame...). I hope that by publishing this people realize in some ways the costs their try pushes have on the infrastructure. This cost impacts not only yourself, but other people waiting for results on other branches.
If you really need those results from try, then by all means use it, it's there for you!
Please keep in mind though, if you have the results you need from try already, cancelling the remaining jobs is just a click away on tbpl.
Call me strange, but I actually enjoy spending time reading up on programming tools that I use regularly. I think of programming tools as tools in same way that a hammer or a saw is a tool. They both help you to get a job done. You need to learn how to use them properly. You need to keep tools well maintained. Sometimes you need to throw a tool away and get a new one.
Stuff I learned about vim:
How I boosted my vim - lots of great vim tips (how did I not know about :set visualbell until now???) and plugins, which introduced me to...
surround - for giving you the ability to work with the surroundings for text objects. Ever wanted to easily add quotes to a word, or change double quotes surrounding a string to single quotes? I know you have - so go install this plugin now!
snipmate - lets you define lots of predefined snippets for various languages. Now in python I can type "def<tab>" and bam! I get a basic function definition.
I wasn't able to get to PyCon US 2012 this year, so I'm very happy that the sessions were all recorderd.
The art of subclassing - great tips on how to do subclassing well in python.
why classes aren't always what you want - I liked how he emphasized that you should be always be open to refactoring your code. Usually making your own exception classes is a bad idea...however one great nugget buried in there was if you can't decide if you should raise a KeyError, AttributeError or TypeError (for example), make a class that inherits from all 3 and raise that. Then consumers can catch what makes sense to them instead of guessing.
introduction to metaclasses - metaclasses aren't so scary after all!