Spread Firefox Affiliate Button

Pages

Profiling Buildbot

Buildbot is a critical part of our build infrastructure at Mozilla. We use it to manage builds on 5 different platforms (Linux, Mac, Windows, Maemo and Windows Mobile), and 5 different branches (mozilla-1.9.1, mozilla-central, TraceMonkey, Electrolysis, and Places). All in all we have 80 machines doing builds across 150 different build types (not counting Talos; all the Talos test runs and slaves are managed by a different master).

And buildbot is at the center of it all.

The load on our machine running buildbot is normally fairly high, and occasionally spikes so that the buildbot process is unresponsive. It normally restores itself within a few minutes, but I’d really like to know why it’s doing this!

Running our staging buildbot master with python’s cProfile module for almost two hours yields the following profile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   416377 4771.188    0.011 4796.749    0.012 {select.select}
       52  526.891   10.133  651.043   12.520 /tools/buildbot/lib/python2.5/site-packages/buildbot-0.7.10p1-py2.5.egg/buildbot/status/web/waterfall.py:834(phase2)
     6518  355.370    0.055  355.370    0.055 {posix.fsync}
   232582  238.943    0.001 1112.039    0.005 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:150(dataReceived)
 10089681  104.395    0.000  130.089    0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:36(b1282int)
36798140/36797962   83.536    0.000   83.537    0.000 {len}
 29913653   70.458    0.000   70.458    0.000 {method 'append' of 'list' objects}
      311   63.775    0.205   63.775    0.205 {bz2.compress}
 10088987   56.581    0.000  665.982    0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:141(gotItem)
4010792/1014652   56.079    0.000  176.693    0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/jelly.py:639(unjelly)
2343910/512709   47.954    0.000  112.446    0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:281(_encode)

Interpreting the results

select shows up in the profile because we’re profiling wall clock time, not cpu time. So the more time we’re spending in select, the better, since that means we’re just waiting for data. The overall run time for this profile was 7,532 seconds, so select is taking around 63% of our total time. I believe the more time spent here, the better. Time spent inside select is idle time.

We already knew that the buildbot waterfall was slow (the second line in profile).

fsync isn’t too surprising either. buildbot calls fsync after writing log files to disk. We’ve considered removing this call, and this profile lends support to our original guess.

The next entries really surprised me, twisted’s dataReceived and a decoding function, b1282int. These are called when processing data received from the slaves. If I’m reading this correctly, this means that dataReceived and children account for around 40% of our total time after you remove the time spent in select. 1112 / (7532-4796) = 40%.

These results are from our staging buildbot master, which doesn’t have anywhere near the same load as the production buildbot master. I would expect that the time spent waiting in select would go down on the production master (there’s more data being received, more often), and that time spent in fsync and dataReceived would go up.

What to do about it?

A few ideas….

  • Use psyco to get some JIT goodness applied to some of the slower python functions.
  • Remove the fsync call after saving logs.
  • Use the cpu-time to profile rather than wallclock time. This will give a different perspective on the performance of buildbot, which should give better information about where we’re spending time processing data.
  • Implement slow pieces in C (or cython). Twisted’s Banana library looks do-able in C, and also is high up in the profile.
  • Send less data from the slaves. We’re currently logging all stdout/stderr produced by the slaves. All of this data is processed by the master process and then saved to disk.
  • Rearchitect buildbot to handle this kind of load.
  • Have more than one buildbot master, each one handling fewer slaves. We’re actively looking into this approach, since it also allows us to have some redundancy for this critical piece of our infrastructure.

Gardening update

The garden is continuing to grow. There’s plenty of lettuce ready to be eaten, and we’ve harvested enough swiss chard to make two quiches!

I also picked a few of the carrots because I was curious how big they had gotten. They look pretty tasty!

Fresh carrots!

I planted two strawberry plants as well. They sent out tons of runners, but only one has bloomed. I’m going to see what happens, and if I get any berries off of it. Next year I think I’ll set up a separate area for the strawberries, otherwise they’d easily take over my 4×4 block with their runners.

Strawberry blooms

There are lots of little tomatoes ripening on the vines as well. I can’t wait for them to be ready to eat, there’s nothing like fresh tomatoes from your own garden.

Baby tomato

I’m going to start a worm composting bin so I can return some of the waste products from the garden (like carrot and onion tops, tomato vines) back to the garden. I’m a bit nervous about this…but the investment is pretty small. There’s a local company, wormcomposting.ca, that sells the red wiggler worms, so I’ll probably be ordering a quarter pound of worms from them shortly.

Gardening Update

My square foot garden is coming along quite nicely now!

I put in the last transplants from a local nursery a few weeks ago: strawberries and some sweet red peppers. I’m still a bit undecided on if it’s worth it to start from seed and then transplant or not. The plants that managed to survive the transplanting process and the onslaught of various pests in the spring are much smaller than plants I’ve seen at the local farmer’s market. I’ve actually picked up some tomatoes and herbs at the farmer’s market to supplement the ones I started from seed. All the lettuce, swiss chard, cucumber and melons that are growing right now I seeded directly into the ground. I think I probably need to make sure the seedlings are getting enough light, and that there is enough nutrients in the soil for the seedlings and transplants.

Herbs and peppers

Onions, carrots and lettuce

I’ve managed to harvest quite a few radishes already, and even some cilantro and basil! I’m a bit worried about the garlic…it looks a bit sickly lately. The onions look great though! I haven’t had too many problems with pests lately. There are nibbles on some leaves of most plants, but nothing really major.

I love going out to check how the plants are growing every day. It looks like I’ll be able to start harvesting some lettuce and swiss chard soon!