Firefox Download Button

Pages

Stuff I learned this weekend – vim, python and more!

Call me strange, but I actually enjoy spending time reading up on programming tools that I use regularly. I think of programming tools as tools in same way that a hammer or a saw is a tool. They both help you to get a job done. You need to learn how to use them properly. You need to keep tools well maintained. Sometimes you need to throw a tool away and get a new one.

For my professional and personal programming I spend 99% of my time writing python with vim, and so I really enjoy learning more about them.

Stuff I learned about vim:

How I boosted my vim – lots of great vim tips (how did I not know about :set visualbell until now???) and plugins, which introduced me to…

nerdtree – for file browsing in vim. It also reminded me to make use of the command-t plugin I had installed a while back.

surround – for giving you the ability to work with the surroundings for text objects. Ever wanted to easily add quotes to a word, or change double quotes surrounding a string to single quotes? I know you have – so go install this plugin now!

snipmate – lets you define lots of predefined snippets for various languages. Now in python I can type “def<tab>” and bam! I get a basic function definition.

I wasn’t able to get to PyCon US 2012 this year, so I’m very happy that the sessions were all recorderd.

The art of subclassing – great tips on how to do subclassing well in python.

why classes aren’t always what you want – I liked how he emphasized that you should be always be open to refactoring your code. Usually making your own exception classes is a bad idea…however one great nugget buried in there was if you can’t decide if you should raise a KeyError, AttributeError or TypeError (for example), make a class that inherits from all 3 and raise that. Then consumers can catch what makes sense to them instead of guessing.

introduction to metaclasses – metaclasses aren’t so scary after all!

nice framework for building gevent services I liked the simple examples here. It introduces the ginkgo framework, which I’m hoping to have some time to play with soon.

Book review: PHP and MongoDB Web Development

I’ve been interested in mongodb for quite some time now, so when a co-worker of mine asked if I was interested in reviewing a book about mongodb, I of course said yes! She put me in touch with the publisher of a book on MongoDB and web development entitled, “PHP and MongoDB Web Development”. I was given a electronic copy of the book to review, and so here are my thoughts after spending a few weeks reading it and playing around with mongodb independently.

This book is subtitled “Beginner’s Guide”, and I think it achieves its goal of being a good introduction to mongodb for beginners. That being said, my primary criticism of the book is that it should include more information on some more advanced features like sharding and replica sets. It’s easy to create web applications for small scales, or that don’t need to be up 99.99% of the time. It’s much harder to design applications that are robust to bursts in load, and to various kinds of network or hardware failures. Without much discussion on these points, it’s hard to form an opinion on whether mongodb would be a suitable choice for developing large scale web applications given the information in this book alone.

Other than that, I quite enjoyed the book and found it filled in quite a few gaps in my (limited) knowledge. Seeing full examples of working code on more complex topics like map reduce, GridFS and geospacial indexing is very helpful to understanding how these features of mongodb could be used in a real application. I found the examples to be a bit verbose at times, although that’s more a fault of PHP than of the book I think, and the formatting in the examples was inconsistent at times. Fortunately all the examples can be downloaded from the publisher’s web site, http://www.packtpub.com/support saving you from having to type it all in!

The book also covers topics like integrating applications with traditional RDBMS like MySQL, and offers some practical examples of how mongodb could be used to augment an application which already is using SQL. It also includes helpful real world examples of how mongodb is used for web analytics, or by foursquare for 2d geospacial indexing.

In summary, the book is a good introduction to mongodb, especially if you’re familiar with php. If you’re looking for more in-depth information about optimizing your queries, or scaling mongodb, or if your language of choice isn’t php, this probably isn’t a good book for you.

How RelEng uses mercurial quickly and safely

Release Engineering uses hg a lot. Every build or test involves code from at least one hg repository.

Last year we started using some internal mirrors at the same time as making use of the hg share extension across the board, both of these had a big impact on the load on hg and time to clone/update local working copies.

I think what we’ve done is pretty useful and resilient to various types of failure, so I hope this blog post is helpful for others trying to automate processes involving hg!

The primary tool we’re using for hg operations is called hgtool (available from our tools repo). Yes, we’re very inventive at naming things.

hgtool’s basic usage is to be given the location of a remote repository, a local directory, and usually a revision. Its job is to make sure that the local directory contains a clean working copy of the repository at the specified revision.

First of all, you don’t need to worry about doing an ‘hg clone’ if the directory doesn’t exist, or ‘hg pull’ if it does exist. This simplifies a lot of build logic!

Next, we’ve build support for mirrors into hgtool. You can pass one or more mirror repositories to the tool with ‘–mirror’, and it will attempt to pull/clone from the mirrors before trying to pull/clone from the primary repository. At Mozilla we have several internal hg mirrors that we use to reduce load on the primary public-facing hg servers.

To improve the case when you need to do a full clone, we’ve added support for importing an hg bundle to initialize the local repository rather than doing a full clone from the mirror or master repositories. You can pass one or more bundle urls with ‘–bundle’. hgtool will download and import the bundle, and then pull in new changesets from the mirrors and master repositories.

Finally, hgtool supports the ‘hg share’ extension. If you specify a base directory for shared repositories, all of the above operations will be run on a locally shared repository first, and then the working copy will be created with ‘hg share’, and updated to the correct revision.

There are all kinds of fallback behaviours specified, like if you fail to import a bundle, try to clone from a mirror; then if you fail to clone from a mirror, try to clone from the master. These fallbacks have resulted in a far more resilient build process.

Investigating hg performance

(caveat lector: this is a long post with lots of shell snippets and output; it’s mostly a brain dump of what I did to investigate performance issues on hg.mozilla.org. I hope you find it useful. Scroll to the bottom for the summary.)

Everybody knows that pushing to try can be slow. but why?

while waiting for my push to try to complete, I wondered what exactly was slow.

I started by cloning my own version of try:

$ hg clone http://hg.mozilla.org try
destination directory: try
requesting all changes
adding changesets
adding manifests
adding file changes
added 95917 changesets with 447521 changes to 89564 files (+2446 heads)
updating to branch default
53650 files updated, 0 files merged, 0 files removed, 0 files unresolved

Next I instrumented hg so I could get some profile information:

$ sudo vi /usr/local/bin/hg
python -m cProfile -o /tmp/hg.profile /usr/bin/hg $*

Then I timed out long it took me to check what would be pushed:

$ time hg out ssh://localhost//home/catlee/mozilla/try
hg out ssh://localhost//home/catlee/mozilla/try  0.57s user 0.04s system 54% cpu 1.114 total

That’s not too bad. Let’s check our profile:

import pstats
pstats.Stats("/tmp/hg.profile").strip_dirs().sort_stats('time').print_stats(10)
Fri Dec  9 00:25:02 2011    /tmp/hg.profile
 
         38744 function calls (37761 primitive calls) in 0.593 seconds
 
   Ordered by: internal time
   List reduced from 476 to 10 due to restriction <10>
 
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       13    0.462    0.036    0.462    0.036 {method 'readline' of 'file' objects}
        1    0.039    0.039    0.039    0.039 {mercurial.parsers.parse_index2}
       40    0.031    0.001    0.031    0.001 revlog.py:291(rev)
        1    0.019    0.019    0.019    0.019 revlog.py:622(headrevs)
   177/70    0.009    0.000    0.019    0.000 {__import__}
     6326    0.004    0.000    0.006    0.000 cmdutil.py:15(parsealiases)
       13    0.003    0.000    0.003    0.000 {method 'read' of 'file' objects}
       93    0.002    0.000    0.008    0.000 cmdutil.py:18(findpossible)
     7212    0.001    0.000    0.001    0.000 {method 'split' of 'str' objects}
  392/313    0.001    0.000    0.007    0.000 demandimport.py:92(_demandimport)

The top item is readline() on file objects? I wonder if that’s socket operations. I’m ssh’ing to localhost, so it’s really fast. Let’s add 100ms latency:

$ sudo tc qdisc add dev lo root handle 1:0 netem delay 100ms
$ time hg out ssh://localhost//home/catlee/mozilla/try
hg out ssh://localhost//home/catlee/mozilla/try  0.58s user 0.05s system 14% cpu 4.339 total
import pstats
pstats.Stats("/tmp/hg.profile").strip_dirs().sort_stats('time').print_stats(10)
Fri Dec  9 00:42:09 2011    /tmp/hg.profile
 
         38744 function calls (37761 primitive calls) in 2.728 seconds
 
   Ordered by: internal time
   List reduced from 476 to 10 due to restriction <10>
 
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       13    2.583    0.199    2.583    0.199 {method 'readline' of 'file' objects}
        1    0.054    0.054    0.054    0.054 {mercurial.parsers.parse_index2}
       40    0.028    0.001    0.028    0.001 revlog.py:291(rev)
        1    0.019    0.019    0.019    0.019 revlog.py:622(headrevs)
   177/70    0.010    0.000    0.019    0.000 {__import__}
       13    0.006    0.000    0.006    0.000 {method 'read' of 'file' objects}
     6326    0.002    0.000    0.004    0.000 cmdutil.py:15(parsealiases)
       93    0.002    0.000    0.006    0.000 cmdutil.py:18(findpossible)
  392/313    0.002    0.000    0.008    0.000 demandimport.py:92(_demandimport)
     7212    0.001    0.000    0.001    0.000 {method 'split' of 'str' objects}

Yep, definitely getting worse with more latency on the network connection.

Oh, and I’m using a recent version of hg:

$ hg --version
Mercurial Distributed SCM (version 2.0)

$ echo hello | ssh localhost hg -R /home/catlee/mozilla/try serve --stdio
145
capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024

This doesn’t match what hg.mozilla.org is running:

$ echo hello | ssh hg.mozilla.org hg -R /mozilla-central serve --stdio
67
capabilities: unbundle lookup changegroupsubset branchmap stream=1

So it must be using an older version. Let’s see what mercurial 1.6 does:

$ mkvirtualenv hg16
New python executable in hg16/bin/python
Installing setuptools...

(hg16)$ pip install mercurial==1.6
Downloading/unpacking mercurial==1.6
  Downloading mercurial-1.6.tar.gz (2.2Mb): 2.2Mb downloaded
...

(hg16)$ hg --version
Mercurial Distributed SCM (version 1.6)

(hg16)$ echo hello | ssh localhost /home/catlee/.virtualenvs/hg16/bin/hg -R /home/catlee/mozilla/mozilla-central serve --stdio
75
capabilities: unbundle lookup changegroupsubset branchmap pushkey stream=1

That looks pretty close to what hg.mozilla.org claims it supports, so let’s time ‘hg out’ again:

(hg16)$ time hg out ssh://localhost//home/catlee/mozilla/try
hg out ssh://localhost//home/catlee/mozilla/try  0.73s user 0.04s system 3% cpu 24.278 total

tl;dr

Finding missing changesets between two local repositories is 6x slower with hg 1.6 (4 seconds with hg 2.0 to 24 seconds hg 1.6). Add a few hundred people and machines hitting the same repository at the same time, and I imagine things can get bad pretty quickly.

Some further searching reveals that mercurial does support a faster method of finding missing changesets in “newer” versions, although I can’t figure out exactly when this change was introduced. There’s already a bug on file for upgrading mercurial on hg.mozilla.org, so hopefully that improves the situation for pushes to try.

The tools we use everyday aren’t magical; they’re subject to normal debugging and profiling techniques. If a tool you’re using is holding you back, find out why!

Christmas tree preparations with an Arduino

We usually get a real Christmas tree if we’re going to be in town for Christmas. A real tree needs watering though, which is something we’ve been less than…consistent with over the past years.

I decided to do something about this and build something to alert me when the water level gets too low. Two strips of aluminum foil taped to either side of a piece of plastic provide my water sensor. One strip is connected to an analog input on the arduino, and the other strip is connected to +3.3V. When the sensor is submerged I get a reading of around 300 “units” from the ADC. When it’s removed from the water, a 10k pulldown resistor brings the reading down to 0.

I’ve hooked up a tri-colour LED to indicate various states, and plan to have an audible alert as well.

I’m not sure if the aluminum will end up corroding, nor if I’ll be able to power this off batteries for any length of time. Still, I’m pretty pleased with it so far!

Here you can see that LED is green when the sensor is submerged, and changes colours (like a traffic light, as per Thomas’ request) when the sensor is removed.

cURL and paste

cURL and paste…two great tastes that apparently don’t go well at all together!

I’ve been writing a bunch of simple wsgi apps lately, some of which handle file uploads.

Take this tiny application:

import webob
 
def app(environ, start_response):
    req = webob.Request(environ)
    req.body_file.read()
    return webob.Response("OK!")(environ, start_response)
 
import paste.httpserver
paste.httpserver.serve(app, port=8090)

Then throw some files at it with cURL:

[catlee] % for f in $(find -type f); do time curl -s -o /dev/null --data-binary @$f http://localhost:8090; done
curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 0% cpu 1.013 total
curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 63% cpu 0.013 total
curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 64% cpu 0.012 total
curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 81% cpu 0.015 total
curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 0% cpu 1.014 total
curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 0% cpu 1.009 total

Huh? Some files take a second to upload?

I discovered after much digging, and rewriting my (more complicated) app several times, that the problem is that cURL sends an extra “Expect: 100-continue” header. This is supposed to let a web server respond with “100 Continue” immediately or reject an upload based on the request headers.

The problem is that paste’s httpserver doesn’t send this by default, and so cURL will wait for a second before giving up and sending the rest of the request.

The magic to turn this off is the ‘-0′ to cURL, which forces HTTP/1.0 mode:

[catlee] % for f in $(find -type f); do time curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090; done
curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 66% cpu 0.012 total
curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 64% cpu 0.012 total
curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.01s system 58% cpu 0.014 total
curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 66% cpu 0.012 total
curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 59% cpu 0.013 total
curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 65% cpu 0.012 total

self-serve builds!

Do you want to be able to cancel your own try server builds?

Do you want to be able to re-trigger a failed nightly build before the RelEng sheriff wakes up?

Do you want to be able to get additional test runs on your build?

If you answered an enthusiastic YES to any or all of these questions, then self-serve is for you.

self-serve was created to provide an API to allow developers to interact with our build infrastructure, with the goal being that others would then create tools against it. It’s still early days for this self-serve API, so just a few caveats:

  • This is very much pre-alpha and may cause your computer to explode, your keg to run dry, or may simply hang.
  • It’s slower than I want. I’ve spent a bit of time optimizing and caching, but I think it can be much better. Just look at shaver’s bugzilla search to see what’s possible for speed. Part of the problem here is that it’s currently running on a VM that’s doing a few dozen other things. We’re working on getting faster hardware, but didn’t want to block this pre-alpha-rollout on that.
  • You need to log in with your LDAP credentials to work with it.
  • The HTML interface is teh suck. Good thing I’m not paid to be a front-end webdev! Really, the goal here wasn’t to create a fully functional web interface, but rather to provide a functional programmatic interface.
  • Changing build priorities may run afoul of bug 555664…haven’t had a chance to test out exactly what happens right now if a high priority job gets merged with a lower priority one.

That being said, I’m proud to be able to finally make this public. Documentation for the REST API is available as part of the web interface itself, and the code is available as part of the buildapi repository on hg.mozilla.org

https://build.mozilla.org/buildapi/self-serve

Please be gentle!

Any questions, problems or feedback can be left here, or filed in bugzilla.

Just who am I talking to? (verifying https connections with python)

Did you know that python’s urllib module supports connecting to web servers over HTTPS? It’s easy!

import urllib
data = urllib.urlopen("https://www.google.com").read()
print data

Did you also know that it provides absolutely zero guarantees that your “secure” data isn’t being observed by a man-in-the-middle?

Run this:

from paste import httpserver
def app(environ, start_response):
    start_response("200 OK", [])
    return "Thanks for your secrets!"
 
httpserver.serve(app, host='127.0.0.1', port='8080', ssl_pem='*')

This little web app will generate a random SSL certificate for you each time it’s run. A self-signed, completely untrustworthy certificate.

Now modify your first script to look at https://localhost:8080 instead. Or, for more fun, keep it pointing at google and mess with your IP routing to redirect google.com:443 to localhost:8080.

iptables -t nat -A OUTPUT -d google.com -p tcp --dport 443 -j DNAT --to-destination 127.0.0.1:8080

Run your script again, and see what it says.

Instead of the raw HTML of google.com, you now get “Thanks for your secrets!”. That’s right, python will happily accept without complaint or warning the random certificate generated this little python app pretending to be google.com.

Sometimes you want to know who you’re talking to, you know?

import httplib, socket, ssl, urllib2
def buildValidatingOpener(ca_certs):
    class VerifiedHTTPSConnection(httplib.HTTPSConnection):
        def connect(self):
            # overrides the version in httplib so that we do
            #    certificate verification
            sock = socket.create_connection((self.host, self.port),
                                            self.timeout)
            if self._tunnel_host:
                self.sock = sock
                self._tunnel()
 
            # wrap the socket using verification with the root
            #    certs in trusted_root_certs
            self.sock = ssl.wrap_socket(sock,
                                        self.key_file,
                                        self.cert_file,
                                        cert_reqs=ssl.CERT_REQUIRED,
                                        ca_certs=ca_certs,
                                        )
 
    # wraps https connections with ssl certificate verification
    class VerifiedHTTPSHandler(urllib2.HTTPSHandler):
        def __init__(self, connection_class=VerifiedHTTPSConnection):
            self.specialized_conn_class = connection_class
            urllib2.HTTPSHandler.__init__(self)
 
        def https_open(self, req):
            return self.do_open(self.specialized_conn_class, req)
 
    https_handler = VerifiedHTTPSHandler()
    url_opener = urllib2.build_opener(https_handler)
 
    return url_opener
 
opener = buildValidatingOpener("/usr/lib/ssl/certs/ca-certificates.crt")
req = urllib2.Request("https://www.google.com")
print opener.open(req).read()

Using the this new validating url opener, we can make sure we’re talking to someone with a validly signed certificate. With our IP redirection in place, or pointing at localhost:8080 explicitly we get a certificate invalid error. We still don’t know for sure that it’s google (could be some other site with a valid ssl certificate), but maybe we’ll tackle that in a future post!

Faster try builds!

When we run a try build, we wipe out the build directory between each job; we want to make sure that every user’s build has a fresh environment to build in.

Unfortunately this means that we also wipe out the clone of the try repo, and so we have to re-clone try every time.

On Linux and OSX we were spending an average of 30 minutes to re-clone try, and on Windows 40 minutes. The majority of that is simply ‘hg clone’ time, but a good portion is due to locks: we need to limit how many simultaneous build slaves are cloning from try at once, otherwise the hg server blows up.

Way back in September, Steve Fink suggested using hg’s share extension to make cloning faster.

Then in November, Ben Hearsum landed some changes that paved the way to actually turning this on.

Today we’ve enabled the share extension for Linux (both 32 and 64-bit) and OSX 10.6 builds on try. Windows and OSX 10.5 are coming too, we need to upgrade hg on the build machines first.

Average times for the ‘clone’ step are down to less than 5 minutes now.

This means you get your builds 25 minutes faster! It also means we’re not hammering the try repo so badly, and so hopefully won’t have to reset it for a long long time.

We’re planning on rolling this out across the board, so nightly builds get faster, release builds get faster, clobber builds get faster, etc…

Enjoy!

3 days of fun: a journey into the bowels of buildbot

I’ve just spent 3 days trying to debug some new code in buildbot.

The code in question is to implement a change to how we do nightly builds such that they use the same revision for all platforms.

I was hitting a KeyError exception inside buildbot’s util.loop code, specifically at a line where it is trying to delete a key from a dictionary. In simple form, the loop is doing this:

for k in d.keys():
    if condition:
        del d[k] # Raises KeyError....sometimes...

Tricky bit was, it didn’t happen every time. I’d have to wait at least 3 minutes between attempts.

So I added a bunch of debugging code:

print d
print d.keys()
for k in d.keys():
    print k
    if condition:
        try:
            del d[k] # Raises KeyError....sometimes...
        except KeyError:
            print k in d # sanity check 1
            print k in d.keys() # sanity check 2

Can you guess what the results of sanity checks 1 and 2 were?

'k in d' is False, but 'k in d.keys()' is True.

whhhaaaaa? Much head scratching and hair pulling ensued. I tried many different variations of iterating through the loop, all with the same result.

In the end, I posted a question on Stack Overflow.

At the same time, Bear and Dustin were zeroing in on a solution. The crucial bit here is that the keys of d are (follow me here…) methods of instances of my new scheduler classes, which inherit from buildbot.util.ComparableMixin…which implements __cmp__ and __hash__. __cmp__ is used in the 'k in d.keys()' test, but __hash__ is used in the 'k in d' test.

Some further digging revealed that my scheduler was modifying state that ComparableMixin.__hash__ was referring to, resulting in the scheduler instances not having stable hashes over time.

Meanwhile, on stackoverflow, adw came up with an answer that confirmed what Dustin and Bear were saying, and katrielalex came up with a simple example to reproduce the problem.

In the end, the fix was simple, just a few extra lines of python code. Too bad it took me 3 days to figure out!