Skip to main content

Posts about technology (old posts, page 6)

python reload: danger, here be dragons

At Mozilla, we use buildbot to coordinate performing builds, unit tests, performance tests, and l10n repacks across all of our build slaves. There is a lot of activity on a project the size of Firefox, which means that the build slaves are kept pretty busy most of the time. Unfortunately, like most software out there, our buildbot code has bugs in it. buildbot provides two ways of picking up new changes to code and configuration: 'buildbot restart' and 'buildbot reconfig'. Restarting buildbot is the cleanest thing to do: it shuts down the existing buildbot process, and starts a new one once the original has shut down cleanly. The problem with restarting is that it interrupts any builds that are currently active. The second option, 'reconfig', is usually a great way to pick up changes to buildbot code without interrupting existing builds. 'reconfig' is implemented by sending SIGHUP to the buildbot process, which triggers a python reload() of certain files. This is where the problem starts. Reloading a module basically re-initializes the module, including redefining any classes that are in the module...which is what you want, right? The whole reason you're reloading is to pick up changes to the code you have in the module! So let's say you have a module, foo.py, with these classes:


class Foo(object):
    def foo(self):
        print "Foo.foo"


class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        Foo.foo(self)
and you're using it like this:

>>> import foo

>>> b = foo.Bar()

>>> b.foo()

Bar.foo

Foo.foo

Looks good! Now, let's do a reload, which is what buildbot does on a 'reconfig':

>>> reload(foo)



>>> b.foo()

Bar.foo

Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/catlee/test/foo.py", line 13, in foo
    Foo.foo(self)
TypeError: unbound method foo() must be called with Foo instance as first argument (got Bar instance instead)

Whoops! What happened? The TypeError exception is complaining that Foo.foo must be called with an instance of Foo as the first argument. (NB: we're calling the unbound method on the class here, not a bound method on the instance, which is why we need to pass in 'self' as the first argument. This is typical when calling your parent class) But wait! Isn't Bar a sub-class of Foo? And why did this work before? Let's try this again, but let's watch what happens to Foo and Bar this time, using the id() function:

>>> import foo

>>> b = foo.Bar()

>>> id(foo.Bar)

3217664

>>> reload(foo)



>>> id(foo.Bar)

3218592

(The id() function returns a unique identifier for objects in python; if two objects have the same id, then they refer to the same object) The id's are different, which means that we get a new Bar class after we reload...I guess that makes sense. Take a look at our b object, which was created before the reload:

>>> b.__class__



>>> id(b.__class__)

3217664

So b is an instance of the old Bar class, not the new one. Let's look deeper:

>>> b.__class__.__bases__

(,)

>>> id(b.__class__.__bases__[0])

3216336

>>> id(foo.Foo)

3218128

A ha! The old Bar's base class (Foo) is different than what's currently defined in the module. After we reloaded the foo module, the Foo class was redefined, which is presumably what we want. The unfortunate side effect of this is that any references by name to the class 'Foo' will pick up the new Foo class, including code in methods of subclasses. There are probably other places where this has unexpected results, but for us, this is the biggest problem. Reloading essentially breaks class inheritance for objects whose lifetime spans the reload. Using super() in the normal way doesn't even work, since you usually refer to your instance's class by name:

class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        super(Bar, self).foo()
If you're using new-style classes, it looks like you can get around this by looking at your __class__ attribute:

class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        super(self.__class__, self).foo()
Buildbot isn't using new-style classes...yet...so we can't use super(). Another workaround I'm playing around with is to use the inspect module to get at the class hierarchy:

def get_parent(obj, n=1):
    import inspect
    return inspect.getmro(obj.__class__)[n]


class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        get_parent(self).foo(self)

PHP is now officially the dumbest language ever

From Slashdot: PHP Gets Namespace Separators, With a Twist. I've never been a big fan of PHP. Yes, it's easy to get up and running, and there are about a bajillion PHP developers out there, but I've always felt that it was a language with no clear direction behind it, riddled with inconsistencies and features that were not well thought out. This latest decision to use '\' as the namespace separator in PHP is simply braindead. Most programming languages in use today use '\' as an escape character. Including, I should add, PHP!!!. Maybe you're supposed to think of using '\' to 'escape' from one namespace into the next? What's wrong with '::'? It's not like PHP prizes readability or brevity. How about '.'? Oh wait, that's right, '.' is used for string concatenation (wouldn't want to use '+' for that...) So yeah, I'm just happy I haven't had to do much in the way of PHP work lately. Any chance I get, I use Python for web stuff, usually with Pylons.

Upgrading Wordpress with Mercurial

Since Mozilla has started using Mercurial for source control, I thought I shoud get some hands on experience with it. My Wordpress dashboard has been nagging me to upgrade to the latest version for quite a while now. I was running 2.5.1 up until today, which was released back in April. I've been putting off upgrading because it's always such a pain if you follow the recommended instructions, and I inevitably end up forgetting to migrate some customization I made to the old version. So, to kill two birds with one stone, I decided to try my hand at upgrading Wordpress by using Mercurial to track my changes to the default install, as well as the changes between versions of Wordpress. Preparation: First, start off with a copy of my blog's code in a directory called 'blog'. Download Wordpress 2.5.1 and 2.6.3 (the version I want to upgrade to). Import initial Wordpress code:


tar zxf wordpress-2.5.1.tar.gz # NB: unpacks into wordpress/

mv wordpress wordpress-2.5.1

cd wordpress-2.5.1

hg init

hg commit -A -m 'wordpress 2.5.1'

cd ..

Apply my changes:

hg clone wordpress-2.5.1 wordpress-mine

cd wordpress-mine

hg qnew -m 'my blog' my-blog.patch

hg locate -0 | xargs -0 rm

cp -ar ../blog/* .

hg addremove

hg qrefresh

cd ..

The 'hg locate -0' line removes all the files currently tracked by Mercurial. This is needed so that any files I deleted from my copy of Wordpress also are deleted in my Mercurial repository. The result of these two steps is that I have a repository that has the original Wordpress source code as one revision, with my changes applied as a Mercurial Queue patch. Now I need to tell Mercurial what's changed between versions 2.5.1 and 2.6.3. To do this, I'll make a copy (or clone) of the 2.5.1 repository, and then put all the 2.6.3 files into it. Again, I use 'hg locate -0 | xargs -0 rm' to delete all the files from the old version before copying the new files in. Mercurial is smart enough to notice if files haven't changed, and the subsequent commit with the '-A' flag will add any new files or delete any files that were removed between 2.5.1 and 2.6.3. Upgrade the pristine 2.5.1 to 2.6.3:

hg clone wordpress-2.5.1 wordpress-2.6.3

tar zxf wordpress-2.6.3 # NB: Unpacks into wordpress/

cd wordpress-2.6.3

hg locate -0 | xargs -0 rm

cp -ar ../wordpress/* .

hg commit -A -m 'wordpress-2.6.3'

cd ..

Now I need to perform the actual upgrade to my blog. First I save the state of the current modifications, then pull in the 2.5.1 -> 2.6.3 changes from the wordpress-2.6.3 repository. Then I reapply my changes to the new 2.6.3 code. Pull in 2.6.3 to my blog:

cd wordpress-mine

hg qsave -e -c

hg pull ../wordpress-2.6.3

hg update -C

hg qpush -a -m

Voilà! A quick rsync to my website, and the upgrade is complete! I have to admit, I don't fully grok some of these Mercurial commands. It took a few tries to work out this series of steps, so there's probably a better way of doing it. I'm pretty happy overall though; I managed a successful Wordpress upgrade, and learned something about Mercurial in the process! The next upgrade should go much more smoothly now that I've figured things out a bit better.

Announcing poster 0.1

I've just uploaded the first public release of poster to my website, and to the cheeseshop. I wrote poster to scratch an itch I've had with Python's standard library: it's hard to do HTTP file uploads. There are a few reasons for this, one is that the standard library doesn't provide a way to do multipart/form-data encoding, and the second reason is that there's no way to stream an upload to the remote server, you have to build the entire request in memory first before sending the request. poster addresses both these issues. The poster.encode module provides multipart/form-data encoding, and the poster.streaminghttp module provides streaming http request support. Here's an example of how you might use it:


# test_client.py

from poster.encode import multipart_encode

from poster.streaminghttp import register_openers

import urllib2



# Register the streaming http handlers with urllib2

register_openers()



# Start the multipart/form-data encoding of the file "DSC0001.jpg"

# "image1" is the name of the parameter, which is normally set

# via the "name" parameter of the HTML  tag.



# headers contains the necessary Content-Type and Content-Length

# datagen is a generator object that yields the encoded parameters

datagen, headers = multipart_encode({"image1": open("DSC0001.jpg")})



# Create the Request object

request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers)

# Actually do the request, and get the response

print urllib2.urlopen(request).read()

Download it as a tarball or egg for python 2.5, or easy_install it from cheeseshop. Bugs, patches, comments or complaints are welcome!

Validating credit card numbers in python

For various reasons I've needed to validate some credit card numbers in Python. For future reference, here's what I've come up with:


import re

def validate_cc(s):
    """
    Returns True if the credit card number ``s`` is valid,
    False otherwise.

    Returning True doesn't imply that a card with this number has ever been,
    or ever will be issued.

    Currently supports Visa, Mastercard, American Express, Discovery
    and Diners Cards.  

    >>> validate_cc("4111-1111-1111-1111")
    True
    >>> validate_cc("4111 1111 1111 1112")
    False
    >>> validate_cc("5105105105105100")
    True
    >>> validate_cc(5105105105105100)
    True
    """
    # Strip out any non-digits
    # Jeff Lait for Prime Minister!
    s = re.sub("[^0-9]", "", str(s))
    regexps = [
            "^4\d{15}$",
            "^5[1-5]\d{14}$",
            "^3[4,7]\d{13}$",
            "^3[0,6,8]\d{12}$",
            "^6011\d{12}$",
            ]

    if not any(re.match(r, s) for r in regexps):
        return False

    chksum = 0
    x = len(s) % 2

    for i, c in enumerate(s):
        j = int(c)
        if i % 2 == x:
            k = j*2
            if k >= 10:
                k -= 9
            chksum += k
        else:
            chksum += j

    return chksum % 10 == 0

nmudiff is awesome

Man, I wish I had known about this before! nmudiff is a program to email an NMU diff to the Debian Bug Tracking System. I often make quick little changes to debian packages to fix bugs or typos, and it's always been a bit of a pain to generate a patch to send to the maintainer. nmudiff uses debdiff (another very useful command I just learned about) to generate the patch, and email it to the bug tracking system with the appropriate tags.