Skip to main content

Posts about tips

Stuff I learned this weekend - vim, python and more!

Call me strange, but I actually enjoy spending time reading up on programming tools that I use regularly. I think of programming tools as tools in same way that a hammer or a saw is a tool. They both help you to get a job done. You need to learn how to use them properly. You need to keep tools well maintained. Sometimes you need to throw a tool away and get a new one. For my professional and personal programming I spend 99% of my time writing python with vim, and so I really enjoy learning more about them. Stuff I learned about vim: How I boosted my vim - lots of great vim tips (how did I not know about :set visualbell until now???) and plugins, which introduced me to... nerdtree - for file browsing in vim. It also reminded me to make use of the command-t plugin I had installed a while back. surround - for giving you the ability to work with the surroundings for text objects. Ever wanted to easily add quotes to a word, or change double quotes surrounding a string to single quotes? I know you have - so go install this plugin now! snipmate - lets you define lots of predefined snippets for various languages. Now in python I can type "def<tab>" and bam! I get a basic function definition. I wasn't able to get to PyCon US 2012 this year, so I'm very happy that the sessions were all recorderd. The art of subclassing - great tips on how to do subclassing well in python. why classes aren't always what you want - I liked how he emphasized that you should be always be open to refactoring your code. Usually making your own exception classes is a bad idea...however one great nugget buried in there was if you can't decide if you should raise a KeyError, AttributeError or TypeError (for example), make a class that inherits from all 3 and raise that. Then consumers can catch what makes sense to them instead of guessing. introduction to metaclasses - metaclasses aren't so scary after all! nice framework for building gevent services I liked the simple examples here. It introduces the ginkgo framework, which I'm hoping to have some time to play with soon.

cURL and paste

cURL and paste...two great tastes that apparently don't go well at all together! I've been writing a bunch of simple wsgi apps lately, some of which handle file uploads. Take this tiny application:

import webob



def app(environ, start_response):
    req = webob.Request(environ)
    req.body_file.read()
    return webob.Response("OK!")(environ, start_response)


import paste.httpserver

paste.httpserver.serve(app, port=8090)

Then throw some files at it with cURL:

[catlee] % for f in $(find -type f); do time curl -s -o /dev/null --data-binary @$f http://localhost:8090; done

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 0% cpu 1.013 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 63% cpu 0.013 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 64% cpu 0.012 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 81% cpu 0.015 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 0% cpu 1.014 total

curl -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 0% cpu 1.009 total

Huh? Some files take a second to upload? I discovered after much digging, and rewriting my (more complicated) app several times, that the problem is that cURL sends an extra "Expect: 100-continue" header. This is supposed to let a web server respond with "100 Continue" immediately or reject an upload based on the request headers. The problem is that paste's httpserver doesn't send this by default, and so cURL will wait for a second before giving up and sending the rest of the request. The magic to turn this off is the '-0' to cURL, which forces HTTP/1.0 mode:

[catlee] % for f in $(find -type f); do time curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090; done

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 66% cpu 0.012 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 64% cpu 0.012 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.01s system 58% cpu 0.014 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 66% cpu 0.012 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.00s user 0.00s system 59% cpu 0.013 total

curl -0 -s -o /dev/null --data-binary @$f http://localhost:8090  0.01s user 0.00s system 65% cpu 0.012 total

Linux on a new Thinkpad T510

I got a new Thinkpad T510 at work to replace my aging MacBook Pro. I asked for a Thinkpad instead of another MacBook because I wanted hardware with better hardware support, in particular the trackpad. I got into the habit of bringing a USB mouse everywhere I went because the trackpad on the MacBook was so unreliable on linux. So when my new T510 arrived, I was pretty excited. And, except for one tiny problem (of the PEBKAC kind), transferring all my files from the old machine to the new one went flawlessly. Here's how I set up the new machine:

  • Download image from sysresccd.org. Follow the instructions to make a bootable USB drive.
  • Boot up computer off USB drive. Resize the existing NTFS partition to be really small. Add 2 new partitions in the new-free space: one for the boot partition for linux, and one to be encrypted and be formatted with lvm.
  • Format boot partition as ext3. Setup encrypted partition with 'cryptsetup luksFormat /dev/sda6; cryptsetup luksOpen /dev/sda6 crypt_sda6'. Setup LVM with 'pvcreate /dev/mapper/crypt_sda6'. Setup two volumes, one for swap, and one for the root partition.
  • Connect network cable between old laptop and new one. Configure local network.
  • Copy files from old /boot to new /boot.
  • Copy files from old / to new /. Here's where I messed up. My command was: 'rsync -aPxX 192.168.2.1:/ /target/'.
  • Install grub.
  • Reboot!
At this point the machine came up ok, but wasn't prompting to decrypt my root drive, and so I had to do some manual steps to get the root drive to mount initially. Fixing up /etc/crypttab and the initramfs solved this. However even after this I was having some problems. I couldn't connect to wireless networks with Network Manager. I couldn't run gnome-power-manager. Files in /var/lib/mysql were owned by ntp! Then I realized that my initial rsync had copied over files preserving the user/group names, not the uid/gid values. And since I wasn't booting off a Debian image, the id/name mappings were quite different. Re-running rsync with '--numeric-ids' got all the ownerships fixed up. After the next reboot things were working flawlessly. Now after a few weeks of using it, I'm enjoying it a lot more than my MacBook Pro. It boots up faster. It connects to wireless networks faster. It suspends/unsuspends faster. It's got real, live, page-up/page-down keys! The trackpad actually works!

Getting free diskspace in python, on Windows

Amazingly, one of the most popular links on this site is the quick tip, Getting free diskspace in python. One of the comments shows that this method doesn't work on Windows. Here's a version that does:

import win32file

def freespace(p):
    """
    Returns the number of free bytes on the drive that ``p`` is on
    """
    secsPerClus, bytesPerSec, nFreeClus, totClus = win32file.GetDiskFreeSpace(p)
    return secsPerClus * bytesPerSec * nFreeClus
The win32file module is part of the pywin32 extension module.

Two great, completely unrelated links

Yesterday was a bit of an overwhelming day. After getting home at 1am after a long bus ride home, I was unwinding by catching up on some news and email. I came across these two links, both of which really lifted my mood. The first, Grokking the Zen of the Vi Wu-Wei, talks about a programmer's journey from emacs to BBEdit to vim. This post is a great read in and of itself, but what's really worth it, is the link around the middle of the post to http://stackoverflow.com/questions/1218390/what-is-your-most-productive-shortcut-with-vim/1220118#1220118. This was truly a joy to read. Definitely the best answer I've ever seen on Stack Overflow, and quite possibly the best discussion of vi I've ever read. It taught me a lot, but I enjoyed reading it for more than that. It was almost like being on a little adventure, discovering all these little hidden secrets about the neighbourhood you've been living in for years. Like I said, it was 1am. The second, The Pope, the judge, the paedophile priest and The New York Times, gave me some reassurance that things aren't always as they seem as reported by the media. Regardless of how you feel about the Church or the Pope, it seems that journalistic integrity has fallen by the wayside here. From the article:

Fr Thomas Brundage, the former Archdiocese of Milwaukee Judicial Vicar who presided over the canonical criminal case of the Wisconsin child abuser Fr Lawrence Murphy, has broken his silence to give a devastating account of the scandal – and of the behaviour of The New York Times, which resurrected the story. It looks as if the media were in such a hurry to to blame the Pope for this wretched business that not one news organisation contacted Fr Brundage. As a result, crucial details were unreported.
The entire article is worth a read.

One useful script, a linux version

Johnathan posted links to 3 scripts he finds useful. His sattap script looked handy, so I hacked it up for linux. Run it to do a screen capture, and upload the image to a website you have ssh access into. The link is printed out, and put into the clipboard. Hope you find this useful!


#!/bin/sh

# sattap - Send a thing to a place

set -e



SCP_USER='catlee'

SCP_HOST='people.mozilla.org'

SCP_PATH='~/public_html/sattap/'



HTTP_URL="http://people.mozilla.org/~catlee/sattap/"



FILENAME=`date | md5sum | head -c 8`.png

FILEPATH=/tmp/$FILENAME



echo Capturing...

import $FILEPATH

echo Copying to $SCP_HOST

scp $FILEPATH ${SCP_USER}@${SCP_HOST}:$SCP_PATH

echo Deleting local copy

rm $FILEPATH



echo $HTTP_URL$FILENAME | xclip -selection clipboard

echo Your file should be at $HTTP_URL$FILENAME, which is also in your paste buffer

Exporting MQ patches

I've been trying to use Mercurial Queues to manage my work on different tasks in several repositories. I try to name all my patches with the name of the bug it's related to; so for my recent work on getting Talos not skipping builds, I would call my patch 'bug468731'. I noticed that I was running this series of steps a lot: cd ~/mozilla/buildbot-configs hg qdiff > ~/patches/bug468731-buildbot-configs.patch cd ~/mozilla/buildbotcustom hg qdiff > ~/patches/bug468731-buildbotcustom.patch ...and then uploading the resulting patch files as attachments to the bug. There's a lot of repetition and extra mental work in those steps:

  • I have to type the bug number manually twice. This is annoying, and error-prone. I've made a typo on more than one occasion and then wasted a few minutes trying to track down where the file went.
  • I have to type the correct repository name for each patch. Again, I've managed to screw this up in the past. Often I have several terminals open, one for each repository, and I can get mixed up as to which repository I've currently got active.
  • mercurial already knows the bug number, since I've used it in the name of my patch.
  • mercurial already knows which repository I'm in.
I wrote the mercurial extension below to help with this. It will take the current patch name, and the basename of the current repository, and save a patch in ~/patches called [patch_name]-[repo_name].patch. It will also compare the current patch to any previous ones in the patches directory, and save a new file if the patches are different, or tell you that you've already saved this patch. To enable this extension, save the code below somewhere like ~/.hgext/mkpatch.py, and then add "mkpatch = ~/.hgext/mkpatch.py" to your .hgrc's extensions section. Then you can run 'hg mkpatch' to automatically create a patch for you in your ~/patches directory!

import os, hashlib



from mercurial import commands, util

from hgext import mq



def mkpatch(ui, repo, *pats, **opts):
    """Saves the current patch to a file called -.patch
    in your patch directory (defaults to ~/patches)
    """
    repo_name = os.path.basename(ui.config('paths', 'default'))
    if opts.get('patchdir'):
        patch_dir = opts.get('patchdir')
        del opts['patchdir']
    else:
        patch_dir = os.path.expanduser(ui.config('mkpatch', 'patchdir', "~/patches"))

    ui.pushbuffer()
    mq.top(ui, repo)
    patch_name = ui.popbuffer().strip()

    if not os.path.exists(patch_dir):
        os.makedirs(patch_dir)
    elif not os.path.isdir(patch_dir):
        raise util.Abort("%s is not a directory" % patch_dir)

    ui.pushbuffer()
    mq.diff(ui, repo, *pats, **opts)
    patch_data = ui.popbuffer()
    patch_hash = hashlib.new('sha1', patch_data).digest()

    full_name = os.path.join(patch_dir, "%s-%s.patch" % (patch_name, repo_name))
    i = 0
    while os.path.exists(full_name):
        file_hash = hashlib.new('sha1', open(full_name).read()).digest()
        if file_hash == patch_hash:
            ui.status("Patch is identical to ", full_name, "; not saving")
            return
        full_name = os.path.join(patch_dir, "%s-%s.patch.%i" % (patch_name, repo_name, i))
        i += 1

    open(full_name, "w").write(patch_data)
    ui.status("Patch saved to ", full_name)


mkpatch_options = [
        ("", "patchdir", '', "patch directory"),
        ]
cmdtable = {
    "mkpatch": (mkpatch, mkpatch_options + mq.cmdtable['^qdiff'][1], "hg mkpatch [OPTION]... [FILE]...")
}

ssh on-the-fly port forwarding

Check out this great tip from nion's blog: ssh on-the-fly port forwarding. I've often wanted to open up new port forwards, but haven't wanted to shut down my existing session.

If you follow this by # character (and thus type ~#) you get a list of all forwarded connections. Using ~C you can open an internal ssh shell that enables you to add and remove local/remote port forwardings ssh> help Commands: -L[bind_address:]port:host:hostport Request local forward -R[bind_address:]port:host:hostport Request remote forward -KR[bind_address:]port Cancel remote forward ssh> -L 8080:localhost:8080

python reload: danger, here be dragons

At Mozilla, we use buildbot to coordinate performing builds, unit tests, performance tests, and l10n repacks across all of our build slaves. There is a lot of activity on a project the size of Firefox, which means that the build slaves are kept pretty busy most of the time. Unfortunately, like most software out there, our buildbot code has bugs in it. buildbot provides two ways of picking up new changes to code and configuration: 'buildbot restart' and 'buildbot reconfig'. Restarting buildbot is the cleanest thing to do: it shuts down the existing buildbot process, and starts a new one once the original has shut down cleanly. The problem with restarting is that it interrupts any builds that are currently active. The second option, 'reconfig', is usually a great way to pick up changes to buildbot code without interrupting existing builds. 'reconfig' is implemented by sending SIGHUP to the buildbot process, which triggers a python reload() of certain files. This is where the problem starts. Reloading a module basically re-initializes the module, including redefining any classes that are in the module...which is what you want, right? The whole reason you're reloading is to pick up changes to the code you have in the module! So let's say you have a module, foo.py, with these classes:


class Foo(object):
    def foo(self):
        print "Foo.foo"


class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        Foo.foo(self)
and you're using it like this:

>>> import foo

>>> b = foo.Bar()

>>> b.foo()

Bar.foo

Foo.foo

Looks good! Now, let's do a reload, which is what buildbot does on a 'reconfig':

>>> reload(foo)



>>> b.foo()

Bar.foo

Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/catlee/test/foo.py", line 13, in foo
    Foo.foo(self)
TypeError: unbound method foo() must be called with Foo instance as first argument (got Bar instance instead)

Whoops! What happened? The TypeError exception is complaining that Foo.foo must be called with an instance of Foo as the first argument. (NB: we're calling the unbound method on the class here, not a bound method on the instance, which is why we need to pass in 'self' as the first argument. This is typical when calling your parent class) But wait! Isn't Bar a sub-class of Foo? And why did this work before? Let's try this again, but let's watch what happens to Foo and Bar this time, using the id() function:

>>> import foo

>>> b = foo.Bar()

>>> id(foo.Bar)

3217664

>>> reload(foo)



>>> id(foo.Bar)

3218592

(The id() function returns a unique identifier for objects in python; if two objects have the same id, then they refer to the same object) The id's are different, which means that we get a new Bar class after we reload...I guess that makes sense. Take a look at our b object, which was created before the reload:

>>> b.__class__



>>> id(b.__class__)

3217664

So b is an instance of the old Bar class, not the new one. Let's look deeper:

>>> b.__class__.__bases__

(,)

>>> id(b.__class__.__bases__[0])

3216336

>>> id(foo.Foo)

3218128

A ha! The old Bar's base class (Foo) is different than what's currently defined in the module. After we reloaded the foo module, the Foo class was redefined, which is presumably what we want. The unfortunate side effect of this is that any references by name to the class 'Foo' will pick up the new Foo class, including code in methods of subclasses. There are probably other places where this has unexpected results, but for us, this is the biggest problem. Reloading essentially breaks class inheritance for objects whose lifetime spans the reload. Using super() in the normal way doesn't even work, since you usually refer to your instance's class by name:

class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        super(Bar, self).foo()
If you're using new-style classes, it looks like you can get around this by looking at your __class__ attribute:

class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        super(self.__class__, self).foo()
Buildbot isn't using new-style classes...yet...so we can't use super(). Another workaround I'm playing around with is to use the inspect module to get at the class hierarchy:

def get_parent(obj, n=1):
    import inspect
    return inspect.getmro(obj.__class__)[n]


class Bar(Foo):
    def foo(self):
        print "Bar.foo"
        get_parent(self).foo(self)

nmudiff is awesome

Man, I wish I had known about this before! nmudiff is a program to email an NMU diff to the Debian Bug Tracking System. I often make quick little changes to debian packages to fix bugs or typos, and it's always been a bit of a pain to generate a patch to send to the maintainer. nmudiff uses debdiff (another very useful command I just learned about) to generate the patch, and email it to the bug tracking system with the appropriate tags.