Spread Firefox Affiliate Button

Pages

One useful script, a linux version

Johnathan posted links to 3 scripts he finds useful. His sattap script looked handy, so I hacked it up for linux. Run it to do a screen capture, and upload the image to a website you have ssh access into. The link is printed out, and put into the clipboard.

Hope you find this useful!

#!/bin/sh
# sattap - Send a thing to a place
set -e
 
SCP_USER='catlee'
SCP_HOST='people.mozilla.org'
SCP_PATH='~/public_html/sattap/'
 
HTTP_URL="http://people.mozilla.org/~catlee/sattap/"
 
FILENAME=`date | md5sum | head -c 8`.png
FILEPATH=/tmp/$FILENAME
 
echo Capturing...
import $FILEPATH
echo Copying to $SCP_HOST
scp $FILEPATH ${SCP_USER}@${SCP_HOST}:$SCP_PATH
echo Deleting local copy
rm $FILEPATH
 
echo $HTTP_URL$FILENAME | xclip -selection clipboard
echo Your file should be at $HTTP_URL$FILENAME, which is also in your paste buffer

poster 0.5 released

I’ve just released version 0.5 of poster, the streaming http upload library for python. It’s easy_installable or downloadable directly from the cheeseshop.

Thanks again to everybody who’s written in with bug fixes and suggestions!

Profiling Buildbot

Buildbot is a critical part of our build infrastructure at Mozilla. We use it to manage builds on 5 different platforms (Linux, Mac, Windows, Maemo and Windows Mobile), and 5 different branches (mozilla-1.9.1, mozilla-central, TraceMonkey, Electrolysis, and Places). All in all we have 80 machines doing builds across 150 different build types (not counting Talos; all the Talos test runs and slaves are managed by a different master).

And buildbot is at the center of it all.

The load on our machine running buildbot is normally fairly high, and occasionally spikes so that the buildbot process is unresponsive. It normally restores itself within a few minutes, but I’d really like to know why it’s doing this!

Running our staging buildbot master with python’s cProfile module for almost two hours yields the following profile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   416377 4771.188    0.011 4796.749    0.012 {select.select}
       52  526.891   10.133  651.043   12.520 /tools/buildbot/lib/python2.5/site-packages/buildbot-0.7.10p1-py2.5.egg/buildbot/status/web/waterfall.py:834(phase2)
     6518  355.370    0.055  355.370    0.055 {posix.fsync}
   232582  238.943    0.001 1112.039    0.005 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:150(dataReceived)
 10089681  104.395    0.000  130.089    0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:36(b1282int)
36798140/36797962   83.536    0.000   83.537    0.000 {len}
 29913653   70.458    0.000   70.458    0.000 {method 'append' of 'list' objects}
      311   63.775    0.205   63.775    0.205 {bz2.compress}
 10088987   56.581    0.000  665.982    0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:141(gotItem)
4010792/1014652   56.079    0.000  176.693    0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/jelly.py:639(unjelly)
2343910/512709   47.954    0.000  112.446    0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:281(_encode)

Interpreting the results

select shows up in the profile because we’re profiling wall clock time, not cpu time. So the more time we’re spending in select, the better, since that means we’re just waiting for data. The overall run time for this profile was 7,532 seconds, so select is taking around 63% of our total time. I believe the more time spent here, the better. Time spent inside select is idle time.

We already knew that the buildbot waterfall was slow (the second line in profile).

fsync isn’t too surprising either. buildbot calls fsync after writing log files to disk. We’ve considered removing this call, and this profile lends support to our original guess.

The next entries really surprised me, twisted’s dataReceived and a decoding function, b1282int. These are called when processing data received from the slaves. If I’m reading this correctly, this means that dataReceived and children account for around 40% of our total time after you remove the time spent in select. 1112 / (7532-4796) = 40%.

These results are from our staging buildbot master, which doesn’t have anywhere near the same load as the production buildbot master. I would expect that the time spent waiting in select would go down on the production master (there’s more data being received, more often), and that time spent in fsync and dataReceived would go up.

What to do about it?

A few ideas….

  • Use psyco to get some JIT goodness applied to some of the slower python functions.
  • Remove the fsync call after saving logs.
  • Use the cpu-time to profile rather than wallclock time. This will give a different perspective on the performance of buildbot, which should give better information about where we’re spending time processing data.
  • Implement slow pieces in C (or cython). Twisted’s Banana library looks do-able in C, and also is high up in the profile.
  • Send less data from the slaves. We’re currently logging all stdout/stderr produced by the slaves. All of this data is processed by the master process and then saved to disk.
  • Rearchitect buildbot to handle this kind of load.
  • Have more than one buildbot master, each one handling fewer slaves. We’re actively looking into this approach, since it also allows us to have some redundancy for this critical piece of our infrastructure.

poster 0.4 released

I’m happy to announce the release of poster version 0.4.

This is a bug fix release, which fixes problems when trying to use poster over a secure connection (with https).

I’ve also reworked some of the code so that it can hopefully work with python 2.4. It passes all the unit tests that I have under python 2.4 now, but since I don’t normally use python 2.4, I’d be interested to hear other people’s experience using it.

One of the things that I love about working on poster, and about open source software in general, is hearing from users all over the world who have found it helpful in some way. It’s always encouraging to hear about how poster is being used, so thank you to all who have e-mailed me!

poster can be downloaded from my website, or from the cheeseshop.

As always, bug reports, comments, and questions are always welcome.

ssh on-the-fly port forwarding

Check out this great tip from nion’s blog:

ssh on-the-fly port forwarding.

I’ve often wanted to open up new port forwards, but haven’t wanted to shut down my existing session.

If you follow this by # character (and thus type ~#) you get a list of all forwarded connections.
Using ~C you can open an internal ssh shell that enables you to add and remove local/remote port forwardings

ssh> help
Commands:
-L[bind_address:]port:host:hostport Request local forward
-R[bind_address:]port:host:hostport Request remote forward
-KR[bind_address:]port Cancel remote forward

ssh> -L 8080:localhost:8080

poster 0.2 is out

I’ve fixed a few bugs with poster, and released the next version, 0.2. It’s available from the cheeseshop, or from my web page.

Documentation can also be found here.

python reload: danger, here be dragons

At Mozilla, we use buildbot to coordinate performing builds, unit tests, performance tests, and l10n repacks across all of our build slaves.

There is a lot of activity on a project the size of Firefox, which means that the build slaves are kept pretty busy most of the time.

Unfortunately, like most software out there, our buildbot code has bugs in it. buildbot provides two ways of picking up new changes to code and configuration: ‘buildbot restart’ and ‘buildbot reconfig’.

Restarting buildbot is the cleanest thing to do: it shuts down the existing buildbot process, and starts a new one once the original has shut down cleanly. The problem with restarting is that it interrupts any builds that are currently active.

The second option, ‘reconfig’, is usually a great way to pick up changes to buildbot code without interrupting existing builds. ‘reconfig’ is implemented by sending SIGHUP to the buildbot process, which triggers a python reload() of certain files.

This is where the problem starts.

Reloading a module basically re-initializes the module, including redefining any classes that are in the module…which is what you want, right? The whole reason you’re reloading is to pick up changes to the code you have in the module!

So let’s say you have a module, foo.py, with these classes:

class Foo(object):
    def foo(self):
        print "Foo.foo"
 
class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        Foo.foo(self)

and you’re using it like this:

>>> import foo
>>> b = foo.Bar()
>>> b.foo()
Bar.foo
Foo.foo

Looks good! Now, let’s do a reload, which is what buildbot does on a ‘reconfig’:

>>> reload(foo)
<module 'foo' from 'foo.pyc'>
>>> b.foo()
Bar.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/catlee/test/foo.py", line 13, in foo
    Foo.foo(self)
TypeError: unbound method foo() must be called with Foo instance as first argument (got Bar instance instead)

Whoops! What happened? The TypeError exception is complaining that Foo.foo must be called with an instance of Foo as the first argument. (NB: we’re calling the unbound method on the class here, not a bound method on the instance, which is why we need to pass in ’self’ as the first argument. This is typical when calling your parent class)

But wait! Isn’t Bar a sub-class of Foo? And why did this work before? Let’s try this again, but let’s watch what happens to Foo and Bar this time, using the id() function:

>>> import foo
>>> b = foo.Bar()
>>> id(foo.Bar)
3217664
>>> reload(foo)
<module 'foo' from 'foo.pyc'>
>>> id(foo.Bar)
3218592

(The id() function returns a unique identifier for objects in python; if two objects have the same id, then they refer to the same object)

The id’s are different, which means that we get a new Bar class after we reload…I guess that makes sense. Take a look at our b object, which was created before the reload:

>>> b.__class__
<class 'foo.Bar'>
>>> id(b.__class__)
3217664

So b is an instance of the old Bar class, not the new one. Let’s look deeper:

>>> b.__class__.__bases__
(<class 'foo.Foo'>,)
>>> id(b.__class__.__bases__[0])
3216336
>>> id(foo.Foo)
3218128

A ha! The old Bar’s base class (Foo) is different than what’s currently defined in the module. After we reloaded the foo module, the Foo class was redefined, which is presumably what we want. The unfortunate side effect of this is that any references by name to the class ‘Foo’ will pick up the new Foo class, including code in methods of subclasses. There are probably other places where this has unexpected results, but for us, this is the biggest problem.

Reloading essentially breaks class inheritance for objects whose lifetime spans the reload. Using super() in the normal way doesn’t even work, since you usually refer to your instance’s class by name:

class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        super(Bar, self).foo()

If you’re using new-style classes, it looks like you can get around this by looking at your __class__ attribute:

class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        super(self.__class__, self).foo()

Buildbot isn’t using new-style classes…yet…so we can’t use super(). Another workaround I’m playing around with is to use the inspect module to get at the class hierarchy:

def get_parent(obj, n=1):
    import inspect
    return inspect.getmro(obj.__class__)[n]
 
class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        get_parent(self).foo(self)

When craziness wraps around… : technovelty

From When craziness wraps around…:

A common trick years ago was to set up your routing tables and then have PID 1 exit so the kernel paniced, because the paniced kernel would continue to route packets with _no_userspace_running_. Darn hard to hack a system like that.

That’s awesome :) Better get your routing tables right the first time though!

Upgrading Wordpress with Mercurial

Since Mozilla has started using Mercurial for source control, I thought I shoud get some hands on experience with it.

My Wordpress dashboard has been nagging me to upgrade to the latest version for quite a while now. I was running 2.5.1 up until today, which was released back in April. I’ve been putting off upgrading because it’s always such a pain if you follow the recommended instructions, and I inevitably end up forgetting to migrate some customization I made to the old version.

So, to kill two birds with one stone, I decided to try my hand at upgrading Wordpress by using Mercurial to track my changes to the default install, as well as the changes between versions of Wordpress.

Preparation:
First, start off with a copy of my blog’s code in a directory called ‘blog’.
Download Wordpress 2.5.1 and 2.6.3 (the version I want to upgrade to).

Import initial Wordpress code:

tar zxf wordpress-2.5.1.tar.gz # NB: unpacks into wordpress/
mv wordpress wordpress-2.5.1
cd wordpress-2.5.1
hg init
hg commit -A -m 'wordpress 2.5.1'
cd ..

Apply my changes:

hg clone wordpress-2.5.1 wordpress-mine
cd wordpress-mine
hg qnew -m 'my blog' my-blog.patch
hg locate -0 | xargs -0 rm
cp -ar ../blog/* .
hg addremove
hg qrefresh
cd ..

The ‘hg locate -0′ line removes all the files currently tracked by Mercurial. This is needed so that any files I deleted from my copy of Wordpress also are deleted in my Mercurial repository.

The result of these two steps is that I have a repository that has the original Wordpress source code as one revision, with my changes applied as a Mercurial Queue patch.

Now I need to tell Mercurial what’s changed between versions 2.5.1 and 2.6.3. To do this, I’ll make a copy (or clone) of the 2.5.1 repository, and then put all the 2.6.3 files into it. Again, I use ‘hg locate -0 | xargs -0 rm’ to delete all the files from the old version before copying the new files in. Mercurial is smart enough to notice if files haven’t changed, and the subsequent commit with the ‘-A’ flag will add any new files or delete any files that were removed between 2.5.1 and 2.6.3.

Upgrade the pristine 2.5.1 to 2.6.3:

hg clone wordpress-2.5.1 wordpress-2.6.3
tar zxf wordpress-2.6.3 # NB: Unpacks into wordpress/
cd wordpress-2.6.3
hg locate -0 | xargs -0 rm
cp -ar ../wordpress/* .
hg commit -A -m 'wordpress-2.6.3'
cd ..

Now I need to perform the actual upgrade to my blog. First I save the state of the current modifications, then pull in the 2.5.1 -> 2.6.3 changes from the wordpress-2.6.3 repository. Then I reapply my changes to the new 2.6.3 code.

Pull in 2.6.3 to my blog:

cd wordpress-mine
hg qsave -e -c
hg pull ../wordpress-2.6.3
hg update -C
hg qpush -a -m

Voilà! A quick rsync to my website, and the upgrade is complete!

I have to admit, I don’t fully grok some of these Mercurial commands. It took a few tries to work out this series of steps, so there’s probably a better way of doing it. I’m pretty happy overall though; I managed a successful Wordpress upgrade, and learned something about Mercurial in the process! The next upgrade should go much more smoothly now that I’ve figured things out a bit better.

Moving on

I’m changing jobs.

Yup, after nearly five years at Side Effects, I’m moving on. I have somewhat mixed feelings about this…I’m sad to be leaving such a friendly and talented group of people, but I’m very excited about my next job.

I’m very happy to say that I will be joining Mozilla Corporation in their Toronto office starting in October. I’ll be working in the Release Engineering group, helping to make sure that the world’s thirst for new Firefox builds can be satisfied! I can’t say how excited I am about this, it’s pretty much a dream job: getting paid to work on a great open source project!