Skip to main content

Going faster!

Thanks to lots of hard work from people like John Ford and Rafael Ávila de Espíndola, we've made some great progress getting our total build + test time down. Average test times have taken a dip due to optimizations on debug builds: test times Average build times have also taken a dip due to disabling PGO: build times These awesome visualizations are brought do you by the Go Faster! dashboard. If you haven't seen it before, go check it out now!

mozconfigs and branches

Firefox's mozconfigs have lived in RelEng's buildbot-configs repo since the dawn of time...well, at least as long as I've been here, so that's a few years at least! With our new rapid release schedule and the explosion of project branches, it's become much more difficult to keep the mozconfigs in sync with code changes across merges between branches. It's always been difficult to coordinate landing code changes with mozconfig changes; and since mozconfig changes happened out of band from regular source code changes, they were invisible on tbpl. We've recently changed the build automation so that mozconfigs will be first checked for in e.g. $topsrcdir/browser/config/mozconfigs/$platform/{nightly,debug}. If this file exists it will be copied into $topsrcdir/.mozconfig, otherwise the original mozconfig will be fetched from hg. There are two major benefits to this change:

  • mozconfig changes are now a distinct point in the development history of the product itself. This means that changes to compiler versions, optimization flags, etc. are tracked along with regular code changes. Changes to the mozconfigs will trigger builds and tests like any other code changes.
  • Changes to the mozconfigs will automatically be carried over between branches during a merge. If you change the compiler used in mozilla-central, then the next time we merge mozilla-central into mozilla-aurora the change to compiler will be merged as well.
Of course this means that if your branch has special settings in your mozconfig, then you need to be careful when merging back to mozilla-central or mozilla-inbound. We'll also have to ensure certain changes like --enable-js-diagnostics are turned off when we move from mozilla-central to mozilla-aurora. In particular, on your first merge from mozilla-central, any custom mozconfig settings you have will be lost. I'm planning on landing the mozconfigs into mozilla-central early next week. As always, if you have any question, comments, or concerns, please let me or anybody else in RelEng know! For all the gory details, see bug 558180. UPDATE: Branches currently using the generic configs (eg try, places, twigs) will lose support for mozconfig-extra[-<platform>].

cURL and paste

cURL and paste...two great tastes that apparently don't go well at all together! I've been writing a bunch of simple wsgi apps lately, some of which handle file uploads. Take this tiny application:

import webob



def app(environ, start_response):
    req = webob.Request(environ)
    req.body_file.read()
    return webob.Response("OK!")(environ, start_response)


import paste.httpserver

paste.httpserver.serve(app, port=8090)

Then throw some files at it with cURL:

[catlee] % for f in $(find -type f); do time curl -s -o /dev/null --data-binary @$f http://localhost:8090; done

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 0% cpu 1.013 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 63% cpu 0.013 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 64% cpu 0.012 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 81% cpu 0.015 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 0% cpu 1.014 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 0% cpu 1.009 total

Huh? Some files take a second to upload? I discovered after much digging, and rewriting my (more complicated) app several times, that the problem is that cURL sends an extra "Expect: 100-continue" header. This is supposed to let a web server respond with "100 Continue" immediately or reject an upload based on the request headers. The problem is that paste's httpserver doesn't send this by default, and so cURL will wait for a second before giving up and sending the rest of the request. The magic to turn this off is the '-0' to cURL, which forces HTTP/1.0 mode:

[catlee] % for f in $(find -type f); do time curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090; done

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 66% cpu 0.012 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 64% cpu 0.012 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.01s system 58% cpu 0.014 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 66% cpu 0.012 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 59% cpu 0.013 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 65% cpu 0.012 total

self-serve nightly builds

Last week I landed some changes to self-serve that let sheriffs trigger nightly builds on our various branches. To use it, head to the branch's self-serve page, e.g. mozilla-central, and at the bottom of the page there are two text inputs: The first triggers a new set of regular builds on the given revision, and the second triggers a new set of nightly builds on the given revision, complete with l10n repacks and updates.

Christmas in Europe - Christmas & Skiing!

Finally, the Alps! We celebrated Christmas with a delicious dinner prepared by Melissa's mom. All told, we spent nine days in Courchevel skiing. Well, intending to ski at least. Pretty much everybody was sick at some point or other. We'd come back from skiing to various folks passed out in the chalet. I took Thomas skiing a few times, and we put him in lessons as well. He ended up having fever for a few days, so missed out on most his lessons :( Despite the fatigue and various interbreeding viruses, the skiing was fantastic. Remember that rain we had driving on the way up from Geneva? That meant lots of fresh powder just for us! The scenery was beautiful! The chalet was quite nice too. Poor Martin was cooped up in there most of the time though :( At the end of the world! Nat, Jeremy, Mel, Pam and I decided to ski all the way to the edge of the 3-valleys one day. Near the bottom left of the map is a place called "Courchevel 1300 Le Praz." That's where we were staying. If you wake up early, and ski really fast, you can make it all the way to the far right hand side of the map and back in one day. We made it up to the top of the lift called "Bouchet" lift, which is 3230m above sea level! The picture above is going up the "Peyron" lift, and is looking onto the slope behind the "Cime Caron". Click through to the full sized image and you can see a bowl full of moguls near the top. That was a lot of fun to come down! Cime Caron The "Cime Caron" lift itself is quite impressive. It's this bus-sized lift that carries about 50 people at once up the hill. We didn't quite make it back to the chalet before all the lifts closed though :( We got as far as Meribel, and then had to take a few buses to get back to Le Praz. We didn't suffer from a lack of eating here either. One night we went out with Natalia and Jeremy and ordered a dish called Raclette. You get served a giant half wheel of cheese under a heating element, along with some meats, pickles, bread and boiled potatoes. The cheese melts and gets all bubbly and crispy, which you then scrape off and eat with all the other goodies! All in all, Courcheval was amazing, and I'd go back in a second!

self-serve builds!

Do you want to be able to cancel your own try server builds? Do you want to be able to re-trigger a failed nightly build before the RelEng sheriff wakes up? Do you want to be able to get additional test runs on your build? If you answered an enthusiastic YES to any or all of these questions, then self-serve is for you. self-serve was created to provide an API to allow developers to interact with our build infrastructure, with the goal being that others would then create tools against it. It's still early days for this self-serve API, so just a few caveats:

  • This is very much pre-alpha and may cause your computer to explode, your keg to run dry, or may simply hang.
  • It's slower than I want. I've spent a bit of time optimizing and caching, but I think it can be much better. Just look at shaver's bugzilla search to see what's possible for speed. Part of the problem here is that it's currently running on a VM that's doing a few dozen other things. We're working on getting faster hardware, but didn't want to block this pre-alpha-rollout on that.
  • You need to log in with your LDAP credentials to work with it.
  • The HTML interface is teh suck. Good thing I'm not paid to be a front-end webdev! Really, the goal here wasn't to create a fully functional web interface, but rather to provide a functional programmatic interface.
  • Changing build priorities may run afoul of bug 555664...haven't had a chance to test out exactly what happens right now if a high priority job gets merged with a lower priority one.
That being said, I'm proud to be able to finally make this public. Documentation for the REST API is available as part of the web interface itself, and the code is available as part of the buildapi repository on hg.mozilla.org https://build.mozilla.org/buildapi/self-serve Please be gentle! Any questions, problems or feedback can be left here, or filed in bugzilla.

Nightly build times getting slower over time

Yesterday some folks in #developers mentioned they felt their builds were getting slower over time. I wondered if the same was true for our build machines.

Here's a chart of build times for the past year. This is just the compile + link step for nightly builds, restricted to a single class of hardware per OS. Same machines. Slower builds. Something isn't right here. Windows builds have gone from an average of 90 minutes last March to 150 minutes this January. The big jump for OSX builds at the end of September is when we turned on the universal x86/x86_64 builds. There's a pretty clear upward trend; some of this is to be expected given new features being added, but at the same time more complexity is creeping into the Makefiles. Each little bit costs developers extra time every day doing their own builds, and it also means slower builds in the build infrastructure. Which means you'll wait longer to get try results, our build pools will have longer wait times, dogs and cats living together, and mass hysteria! I'm sure there are places in our build process that can be sped up. Think you can help? Are you a build system rock star? Do you refactor Makefiles in your sleep? Great! We're hiring!

Just who am I talking to? (verifying https connections with python)

Did you know that python's urllib module supports connecting to web servers over HTTPS? It's easy!


import urllib

data = urllib.urlopen("https://www.google.com").read()

print data

Did you also know that it provides absolutely zero guarantees that your "secure" data isn't being observed by a man-in-the-middle? Run this:

from paste import httpserver

def app(environ, start_response):
    start_response("200 OK", [])
    return "Thanks for your secrets!"


httpserver.serve(app, host='127.0.0.1', port='8080', ssl_pem='*')

This little web app will generate a random SSL certificate for you each time it's run. A self-signed, completely untrustworthy certificate. Now modify your first script to look at https://localhost:8080 instead. Or, for more fun, keep it pointing at google and mess with your IP routing to redirect google.com:443 to localhost:8080.

iptables -t nat -A OUTPUT -d google.com -p tcp --dport 443 -j DNAT --to-destination 127.0.0.1:8080

Run your script again, and see what it says. Instead of the raw HTML of google.com, you now get "Thanks for your secrets!". That's right, python will happily accept without complaint or warning the random certificate generated this little python app pretending to be google.com. Sometimes you want to know who you're talking to, you know?

import httplib, socket, ssl, urllib2

def buildValidatingOpener(ca_certs):
    class VerifiedHTTPSConnection(httplib.HTTPSConnection):
        def connect(self):
            # overrides the version in httplib so that we do
            #    certificate verification
            sock = socket.create_connection((self.host, self.port),
                                            self.timeout)
            if self._tunnel_host:
                self.sock = sock
                self._tunnel()

            # wrap the socket using verification with the root
            #    certs in trusted_root_certs
            self.sock = ssl.wrap_socket(sock,
                                        self.key_file,
                                        self.cert_file,
                                        cert_reqs=ssl.CERT_REQUIRED,
                                        ca_certs=ca_certs,
                                        )

    # wraps https connections with ssl certificate verification
    class VerifiedHTTPSHandler(urllib2.HTTPSHandler):
        def __init__(self, connection_class=VerifiedHTTPSConnection):
            self.specialized_conn_class = connection_class
            urllib2.HTTPSHandler.__init__(self)

        def https_open(self, req):
            return self.do_open(self.specialized_conn_class, req)

    https_handler = VerifiedHTTPSHandler()
    url_opener = urllib2.build_opener(https_handler)

    return url_opener


opener = buildValidatingOpener("/usr/lib/ssl/certs/ca-certificates.crt")

req = urllib2.Request("https://www.google.com")

print opener.open(req).read()

Using the this new validating url opener, we can make sure we're talking to someone with a validly signed certificate. With our IP redirection in place, or pointing at localhost:8080 explicitly we get a certificate invalid error. We still don't know for sure that it's google (could be some other site with a valid ssl certificate), but maybe we'll tackle that in a future post!

Christmas in Europe - Fontainebleau to Courchevel

After the graduation we had planned on spending the rest of our Christmas holidays in Courchevel to enjoy some skiing in the Alps! For some reason instead of driving from Fontainebleau to Courchevel (a relatively easy, but long 6 hour drive), we had booked flights to Geneva via Zurich. Pick up the rental vans in Geneva, a quick 2 hour drive to Courchevel, and we're there! Easy, right? What could possibly go wrong? The same cab driver who picked us up from Charles de Gaul the week before met us at 6 in the morning outside our hotel in the same van he had driven before...except now we had two more people (Nat & Jeremy) plus luggage. Seven adults plus driver and two kids with luggage in an 8 seater van is...a tight squeeze! The kids ended up on our laps, and everybody had luggage piled on top of them. It was a relief to get to the airport and be able to move again! Remember that vicious snowstorm that crippled the Frankfurt airport? Yeah, most of Europe was still getting snow and airports were struggling to cope and our flight from Paris to Zurich was a bit late taking off as a result. When we arrived in Zurich we were informated that our flight to Geneva had been cancelled, and all the of the other flights that day were full. The very helpful Swiss Air agent offered to put us on a train to Geneva instead. No problem, we said, let's just get our luggage first. It's about 1pm at this point. And so began the Great Waiting. We were instructed to head on down to the luggage pickup area and await our luggage at a special carousel reserved just for suckers redirected luggage. An hour or so later with none of our luggage in sight (but plenty of other folks' luggage stacked up along the walls...not a good sign!), further inquiries to the luggage folks lead us to believe that maybe our luggage got sent to Geneva without us. No wait, it's still here. Oh, now we don't know where it is at all...it must be lost along with the tens of thousands of other pieces we haven't dealt with in the back. The good side to all this is that the two boys were having a blast. No, really. Something I never realized before was that the luggage claim area is mostly deserted. The time between when flights arrive offers two young boys a giant playground all to themselves: all kinds of interesting things to climb on and lots of room to run around and throw toys. At around 5pm we finally give up and decide to take the train to Geneva. I think the boys were kind of sad when we finally left! However, their mood quickly improved with the discovery of a playground on the train!. Three hours later, we're in Geneva, at the airport, and our luggage is waiting for us! We wonder how long it's been there...probably all day :P Picking up the rental vans was relatively painless, and the drive to Courchevel uneventful. It was pouring rain for much of the drive though, which didn't bode well for skiing. And at this point the frantic pace of the past few days really caught up with us, several of us were sick now with colds or fever.

Faster try builds!

When we run a try build, we wipe out the build directory between each job; we want to make sure that every user's build has a fresh environment to build in. Unfortunately this means that we also wipe out the clone of the try repo, and so we have to re-clone try every time. On Linux and OSX we were spending an average of 30 minutes to re-clone try, and on Windows 40 minutes. The majority of that is simply 'hg clone' time, but a good portion is due to locks: we need to limit how many simultaneous build slaves are cloning from try at once, otherwise the hg server blows up. Way back in September, Steve Fink suggested using hg's share extension to make cloning faster. Then in November, Ben Hearsum landed some changes that paved the way to actually turning this on. Today we've enabled the share extension for Linux (both 32 and 64-bit) and OSX 10.6 builds on try. Windows and OSX 10.5 are coming too, we need to upgrade hg on the build machines first. Average times for the 'clone' step are down to less than 5 minutes now. This means you get your builds 25 minutes faster! It also means we're not hammering the try repo so badly, and so hopefully won't have to reset it for a long long time. We're planning on rolling this out across the board, so nightly builds get faster, release builds get faster, clobber builds get faster, etc... Enjoy!