I’ve just released version 0.5 of poster, the streaming http upload library for python. It’s easy_installable or downloadable directly from the cheeseshop.
Thanks again to everybody who’s written in with bug fixes and suggestions!
|
|||
|
I’ve just released version 0.5 of poster, the streaming http upload library for python. It’s easy_installable or downloadable directly from the cheeseshop. Thanks again to everybody who’s written in with bug fixes and suggestions! Buildbot is a critical part of our build infrastructure at Mozilla. We use it to manage builds on 5 different platforms (Linux, Mac, Windows, Maemo and Windows Mobile), and 5 different branches (mozilla-1.9.1, mozilla-central, TraceMonkey, Electrolysis, and Places). All in all we have 80 machines doing builds across 150 different build types (not counting Talos; all the Talos test runs and slaves are managed by a different master). And buildbot is at the center of it all. The load on our machine running buildbot is normally fairly high, and occasionally spikes so that the buildbot process is unresponsive. It normally restores itself within a few minutes, but I’d really like to know why it’s doing this! Running our staging buildbot master with python’s cProfile module for almost two hours yields the following profile:
ncalls tottime percall cumtime percall filename:lineno(function)
416377 4771.188 0.011 4796.749 0.012 {select.select}
52 526.891 10.133 651.043 12.520 /tools/buildbot/lib/python2.5/site-packages/buildbot-0.7.10p1-py2.5.egg/buildbot/status/web/waterfall.py:834(phase2)
6518 355.370 0.055 355.370 0.055 {posix.fsync}
232582 238.943 0.001 1112.039 0.005 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:150(dataReceived)
10089681 104.395 0.000 130.089 0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:36(b1282int)
36798140/36797962 83.536 0.000 83.537 0.000 {len}
29913653 70.458 0.000 70.458 0.000 {method 'append' of 'list' objects}
311 63.775 0.205 63.775 0.205 {bz2.compress}
10088987 56.581 0.000 665.982 0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:141(gotItem)
4010792/1014652 56.079 0.000 176.693 0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/jelly.py:639(unjelly)
2343910/512709 47.954 0.000 112.446 0.000 /tools/twisted-8.0.1/lib/python2.5/site-packages/twisted/spread/banana.py:281(_encode)
Interpreting the results
We already knew that the buildbot waterfall was slow (the second line in profile).
The next entries really surprised me, twisted’s These results are from our staging buildbot master, which doesn’t have anywhere near the same load as the production buildbot master. I would expect that the time spent waiting in What to do about it?A few ideas….
I’m happy to announce the release of poster version 0.4. This is a bug fix release, which fixes problems when trying to use poster over a secure connection (with https). I’ve also reworked some of the code so that it can hopefully work with python 2.4. It passes all the unit tests that I have under python 2.4 now, but since I don’t normally use python 2.4, I’d be interested to hear other people’s experience using it. One of the things that I love about working on poster, and about open source software in general, is hearing from users all over the world who have found it helpful in some way. It’s always encouraging to hear about how poster is being used, so thank you to all who have e-mailed me! poster can be downloaded from my website, or from the cheeseshop. As always, bug reports, comments, and questions are always welcome. I’ve been trying to use Mercurial Queues to manage my work on different tasks in several repositories. I try to name all my patches with the name of the bug it’s related to; so for my recent work on getting Talos not skipping builds, I would call my patch ‘bug468731′. I noticed that I was running this series of steps a lot:
I wrote the mercurial extension below to help with this. It will take the current patch name, and the basename of the current repository, and save a patch in ~/patches called [patch_name]-[repo_name].patch. It will also compare the current patch to any previous ones in the patches directory, and save a new file if the patches are different, or tell you that you’ve already saved this patch. To enable this extension, save the code below somewhere like ~/.hgext/mkpatch.py, and then add “mkpatch = ~/.hgext/mkpatch.py” to your .hgrc’s extensions section. Then you can run ‘hg mkpatch’ to automatically create a patch for you in your ~/patches directory! import os, hashlib from mercurial import commands, util from hgext import mq def mkpatch(ui, repo, *pats, **opts): """Saves the current patch to a file called <patch_name>-<repo_name>.patch in your patch directory (defaults to ~/patches) """ repo_name = os.path.basename(ui.config('paths', 'default')) if opts.get('patchdir'): patch_dir = opts.get('patchdir') del opts['patchdir'] else: patch_dir = os.path.expanduser(ui.config('mkpatch', 'patchdir', "~/patches")) ui.pushbuffer() mq.top(ui, repo) patch_name = ui.popbuffer().strip() if not os.path.exists(patch_dir): os.makedirs(patch_dir) elif not os.path.isdir(patch_dir): raise util.Abort("%s is not a directory" % patch_dir) ui.pushbuffer() mq.diff(ui, repo, *pats, **opts) patch_data = ui.popbuffer() patch_hash = hashlib.new('sha1', patch_data).digest() full_name = os.path.join(patch_dir, "%s-%s.patch" % (patch_name, repo_name)) i = 0 while os.path.exists(full_name): file_hash = hashlib.new('sha1', open(full_name).read()).digest() if file_hash == patch_hash: ui.status("Patch is identical to ", full_name, "; not saving") return full_name = os.path.join(patch_dir, "%s-%s.patch.%i" % (patch_name, repo_name, i)) i += 1 open(full_name, "w").write(patch_data) ui.status("Patch saved to ", full_name) mkpatch_options = [ ("", "patchdir", '', "patch directory"), ] cmdtable = { "mkpatch": (mkpatch, mkpatch_options + mq.cmdtable['^qdiff'][1], "hg mkpatch [OPTION]... [FILE]...") } I’ve fixed a few bugs with poster, and released the next version, 0.2. It’s available from the cheeseshop, or from my web page. Documentation can also be found here. At Mozilla, we use buildbot to coordinate performing builds, unit tests, performance tests, and l10n repacks across all of our build slaves. There is a lot of activity on a project the size of Firefox, which means that the build slaves are kept pretty busy most of the time. Unfortunately, like most software out there, our buildbot code has bugs in it. buildbot provides two ways of picking up new changes to code and configuration: ‘buildbot restart’ and ‘buildbot reconfig’. Restarting buildbot is the cleanest thing to do: it shuts down the existing buildbot process, and starts a new one once the original has shut down cleanly. The problem with restarting is that it interrupts any builds that are currently active. The second option, ‘reconfig’, is usually a great way to pick up changes to buildbot code without interrupting existing builds. ‘reconfig’ is implemented by sending SIGHUP to the buildbot process, which triggers a python reload() of certain files. This is where the problem starts. Reloading a module basically re-initializes the module, including redefining any classes that are in the module…which is what you want, right? The whole reason you’re reloading is to pick up changes to the code you have in the module! So let’s say you have a module, foo.py, with these classes: class Foo(object): def foo(self): print "Foo.foo" class Bar(Foo): def foo(self): print "Bar.foo" Foo.foo(self) and you’re using it like this: >>> import foo >>> b = foo.Bar() >>> b.foo() Bar.foo Foo.foo Looks good! Now, let’s do a reload, which is what buildbot does on a ‘reconfig’: >>> reload(foo) <module 'foo' from 'foo.pyc'> >>> b.foo() Bar.foo Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/catlee/test/foo.py", line 13, in foo Foo.foo(self) TypeError: unbound method foo() must be called with Foo instance as first argument (got Bar instance instead) Whoops! What happened? The TypeError exception is complaining that Foo.foo must be called with an instance of Foo as the first argument. (NB: we’re calling the unbound method on the class here, not a bound method on the instance, which is why we need to pass in ’self’ as the first argument. This is typical when calling your parent class) But wait! Isn’t Bar a sub-class of Foo? And why did this work before? Let’s try this again, but let’s watch what happens to Foo and Bar this time, using the id() function: >>> import foo >>> b = foo.Bar() >>> id(foo.Bar) 3217664 >>> reload(foo) <module 'foo' from 'foo.pyc'> >>> id(foo.Bar) 3218592 (The id() function returns a unique identifier for objects in python; if two objects have the same id, then they refer to the same object) The id’s are different, which means that we get a new Bar class after we reload…I guess that makes sense. Take a look at our b object, which was created before the reload: >>> b.__class__ <class 'foo.Bar'> >>> id(b.__class__) 3217664 So b is an instance of the old Bar class, not the new one. Let’s look deeper: >>> b.__class__.__bases__ (<class 'foo.Foo'>,) >>> id(b.__class__.__bases__[0]) 3216336 >>> id(foo.Foo) 3218128 A ha! The old Bar’s base class (Foo) is different than what’s currently defined in the module. After we reloaded the foo module, the Foo class was redefined, which is presumably what we want. The unfortunate side effect of this is that any references by name to the class ‘Foo’ will pick up the new Foo class, including code in methods of subclasses. There are probably other places where this has unexpected results, but for us, this is the biggest problem. Reloading essentially breaks class inheritance for objects whose lifetime spans the reload. Using super() in the normal way doesn’t even work, since you usually refer to your instance’s class by name: class Bar(Foo): def foo(self): print "Bar.foo" super(Bar, self).foo() If you’re using new-style classes, it looks like you can get around this by looking at your __class__ attribute: class Bar(Foo): def foo(self): print "Bar.foo" super(self.__class__, self).foo() Buildbot isn’t using new-style classes…yet…so we can’t use super(). Another workaround I’m playing around with is to use the inspect module to get at the class hierarchy: def get_parent(obj, n=1): import inspect return inspect.getmro(obj.__class__)[n] class Bar(Foo): def foo(self): print "Bar.foo" get_parent(self).foo(self) I’ve just uploaded the first public release of poster to my website, and to the cheeseshop. I wrote poster to scratch an itch I’ve had with Python’s standard library: it’s hard to do HTTP file uploads. There are a few reasons for this, one is that the standard library doesn’t provide a way to do multipart/form-data encoding, and the second reason is that there’s no way to stream an upload to the remote server, you have to build the entire request in memory first before sending the request. poster addresses both these issues. The poster.encode module provides multipart/form-data encoding, and the poster.streaminghttp module provides streaming http request support. Here’s an example of how you might use it: # test_client.py from poster.encode import multipart_encode from poster.streaminghttp import register_openers import urllib2 # Register the streaming http handlers with urllib2 register_openers() # Start the multipart/form-data encoding of the file "DSC0001.jpg" # "image1" is the name of the parameter, which is normally set # via the "name" parameter of the HTML <input> tag. # headers contains the necessary Content-Type and Content-Length # datagen is a generator object that yields the encoded parameters datagen, headers = multipart_encode({"image1": open("DSC0001.jpg")}) # Create the Request object request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers) # Actually do the request, and get the response print urllib2.urlopen(request).read() Download it as a tarball or egg for python 2.5, or easy_install it from cheeseshop. Bugs, patches, comments or complaints are welcome! For various reasons I’ve needed to validate some credit card numbers in Python. For future reference, here’s what I’ve come up with: import re def validate_cc(s): """ Returns True if the credit card number ``s`` is valid, False otherwise. Returning True doesn't imply that a card with this number has ever been, or ever will be issued. Currently supports Visa, Mastercard, American Express, Discovery and Diners Cards. >>> validate_cc("4111-1111-1111-1111") True >>> validate_cc("4111 1111 1111 1112") False >>> validate_cc("5105105105105100") True >>> validate_cc(5105105105105100) True """ # Strip out any non-digits # Jeff Lait for Prime Minister! s = re.sub("[^0-9]", "", str(s)) regexps = [ "^4\d{15}$", "^5[1-5]\d{14}$", "^3[4,7]\d{13}$", "^3[0,6,8]\d{12}$", "^6011\d{12}$", ] if not any(re.match(r, s) for r in regexps): return False chksum = 0 x = len(s) % 2 for i, c in enumerate(s): j = int(c) if i % 2 == x: k = j*2 if k >= 10: k -= 9 chksum += k else: chksum += j return chksum % 10 == 0 To calculate the amount of free disk space in Python, you can use the os.stafvfs() function. For some reason, I can never find the docs for os.statvfs() on the first or second try (it’s in the “Files and Directories” section in the os module), and I never remember how it works, so I’m posting this as a note to myself, and maybe to help out anybody else wanting to do the same thing. A simple free space function can be written as: I use the f_bavail attribute instead of f_bfree, since the latter includes blocks that are reserved for the the super-user’s use. I’m not sure, however, on the distinction between f_bsize and f_frsize. Will, I am in 100% agreement. Thanks a ton for this! Will posted a link to the Python Sidebar which adds a sidebar to Mozilla or Firefox for accessing Python’s excellent online documentation. via Planet Python |
|||
|
Copyright © 2010 chris' random ramblings - All Rights Reserved
|
|||