Skip to main content

Using the clipboard with Vim

Via Planet Debian: Enrico Zini posts about using the clipboard with vim Two cool things I learned from Enrico's post:

  1. xclip is cool. Especially with combined with zsh. Say you want to have the output of a command printed to the terminal, but also copied to the clipboard. sha1sum * | tee >(xclip)
  2. I read the x11-selection help page in vim and discovered that you can access the copy/paste clipboard with the '+' register. Wow! No more pasting in text, having the indentation screwed up, undoing, setting paste mode, pasting again, unsetting paste mode!

Is Digital Photography "Real"?

One of the podcasts I subscribe to is Dennis Hays' Secrets of Digital Imaging. In his June 19th Podcast, he spoke about the ongoing debate he has with his sister over whether or not digital photography is "real" photography. I thought I would post a few thoughts on this, since I found it an interesting topic for debate. I used to wonder whether a digital image printed from a lab was a "real" picture or not. From Wikipedia:

The word [photography] comes from the Greek words ??? phos ("light"), and ?????? graphis ("stylus", "paintbrush") or ????? graphê, together meaning "drawing with light" or "representation by means of lines" or "drawing."
So when we take a picture we're using the light coming from our subject to draw a representation on our medium, whether that be film or a digital sensor. This is never an exact representation, nobody has invented a film or sensor that can capture all the light coming from a subject. What we're left with is an approximation that is meant to look the same to the human eye as the real thing. I don't know too much about film processing, but I do know that there are many techniques that a photographer can use in the darkroom to manipulate the final print, from how the film is developed, to how the image is enlarged onto the print paper. People have been burning and dodging, touching up dust specks and airbrushing images on film long before digital cameras came around. And film can always be scanned into a computer and edited with digital darkroom tools. Not to mention manipulating the actual environment (lighting, composition, etc.) to achieve a desired effect. On the digital side of things, yes, the digital format is easier for most people to edit, but does this make it less "real"? Film can be modified as well. On the plus side, some digital cameras can give us the guarantee that an image hasn't been modified, by using encryption techniques similar to what your bank's website uses to make sure that your browser can connect to it safely. This is extremely important for law enforcement work where you need to be able to show that an image has not been modified. Where does this leave us? Film and digital both capture light and record it, although in different ways. Film and digital images can both be altered to improve, repair, or even misrepresent the original subject. If by "real" we mean, "is this the same image that was captured by the camera?", then I believe that digital has the edge since we can use encryption techniques to ensure that a given image has not been modified. For the present, the majority of digital cameras do not have this capability...which means that it comes down to how much we trust the people involved in bringing the image from the subject to print. Did the photographer change anything? Did the lab technician? Did the publisher? Is that image in the newspaper or magazine trustworthy? Whether the photographer used a film camera or a digital camera is irrelevant to the answer.

Python Warts, part 2 - the infamous GIL

I'm going to come out and say that the global interpreter lock (GIL) in Python bothers me. For those who don't know, the GIL in the C implementation of Python allows only one thread to be running Python code at any one time. Extension modules executing C/C++/!Python code can release the GIL so that other threads can run, but this doesn't apply in general to regular Python code. Ian Bicking posted a while back about the GIL of Doom. Granted, his post was originally written in October 2003, so things have changed a bit since then. I believe the main thrust of his argument was that there are only a few cases where the GIL would really get in your way. Those cases are basically where you are doing some CPU bound task that isn't easily separated into separate processes, and is running on a multi-processor machine. The way to get around the GIL in Python is to split up your application into separate processes, and use some kind of inter-process communication (IPC) mechanism to transfer work/results between processes. The message seems to be: "You don't really want to use a shared address space threading model, do you? I'm sure you'd much rather just use a separate process. Everybody knows that's better." Suggestions such as calling time.sleep(0), or fiddling around with sys.setcheckinterval are hacky, and clutter up your code for no good reason other than to work around deficiencies of the interpreter. Yes, sharing an address space with multiple threads of execution can be tricky. But IPC is no picnic either. Starting a new process can be expensive. os.fork() isn't available on all platforms. There is, AFAIK, no portable shared memory module for python (POSH seems to be dead?), so to send data between processes you need to set up a socket, pipe or use temporary files, leading to extra code (more to write, more to read and understand afterwards), setup overhead (system calls aren't free), performance impact (serializing data isn't free), and room for buggy implementations (did you clean up your temporary files? did you close your socket? did you set restrictive permissions on your socket file?) In many ways, threading is much simpler. It's simple to set up, has low overhead, no data copying costs, and is self-contained in the process so you're not leaking out-of-process resources (sockets files, bound addresses, temporary files, etc.). Python has never been the type of language to prevent the developer from doing "unsafe" things, that's why there aren't really private members on classes. Ian Bicking again writes (in a different post),

An important rule in the Python community is: we are all consenting adults. That is, it is not the responsibility of the language designer or library author to keep people from doing bad things. It is their responsibility to prevent people doing bad things accidentally. But if you really want to do something bad, who are we to say you are wrong? It's your program. Maybe you even have a good reason.
Python's GIL is getting in my way. Yes I can do bad things with multiple threads sharing one address space, but that should be my problem, not a restriction of the language implementation. With multi-core CPUs becoming more and more common, and not only in the server domain, I think this will become more and more of an issue for Python. In the short term, some slick IPC would be nice, but in the long term a truly multithreaded Python interpreter would benefit everybody. Talk is cheap, I know...code is what counts here. Maybe a PEP or SIG could be started to flesh out what would be required to get this accomplished for Python 3000.

Rebooting linux faster with kexec (and even faster with kexec-chooser!)

Somehow when reading through the Linux 2.6.17 changelog last week I came across a few articles discussing the kexec feature of recent Linux kernels. It's pretty neat, you can boot directly into another kernel image without having to go through a hardware / BIOS reboot. There's a Debian package called kexec-tools which gives you the ability to load these kernel images into memory and to boot into them. I found kexec a bit cumbersome to use, especially since all the kernels I care about booting into are the stock Debian kernels, and they all ship with ramdisk images that need to be used properly to boot. Using kexec by itself also requires that you have to manually bring the machine into a rebootable state first, or hack up some system scripts. You shouldn't just boot into a new kernel directly without shutting down devices, unmounting file systems, etc. So to scratch this itch, I wrote kexec-chooser. It's a small Python script that will allow you to easily warm-reboot into any of the stock Debian kernels installed on your system. It'll probably work with custom kernels as well, but I haven't tested that yet :) Downloads and more information can be found on the kexec-chooser page.

kexec-chooser

What is it? kexec-chooser is a small utility that makes warm rebooting into new kernels under Debian a bit easier. What is warm rebooting? Recent Linux kernels support a new feature called kexec. Basically it allows you to load a new kernel image into memory, and then boot into that kernel directory without having to do a full reboot of the machine. This can speed up reboots significantly. Where do I get it? Download it here: You'll also need Python version 2.4 installed, as well as the kexec-tools package. How do I install it? If you downloaded the .deb version, then running dpkg -i kexec-chooser_0.1_all.deb should work. If you downloaded the .tar.gz version, then your best bet is to copy kexec-chooser into /usr/sbin. This package is really designed to work on Debian, so your milage may vary on other distributions. How do I use it?
kexec -l
will print a list of available kernels.
kexec 2.6.17
will indicate that you want to warm-reboot into the 2.6.17 kernel. Note that this won't actually reboot your machine. When you reboot via the reboot command, or GNOME's Shut Down/Reboot dialog, then kexec-chooser will warm-reboot into the kernel you specified. See the man page for more information.

Spam Attack?

In the past few days I've had around a 100x increase in hits to my site...And also a barrage of spam comments coming in. The hits aren't coming from any one IP, but they don't seem to be referred by anywhere (Referrer header is empty). The User-Agent is typically "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)". I do use some JavaScript traffic tracking on this site, and these hits aren't showing up, which leads me to think that either hoards of IE users have JavaScript turned off, or this is some new distributed trojan spam attack. Anybody else noticed this recently? Technorati tags: spam blog blogging comment

Using server-side objects with XML-RPC

XML-RPC is a very handy little standard. It's straightforward, lightweight, and implementations exist for pretty much every language out there. One thing I've found a bit lacking with it however is that it's kind of a pain to deal with objects. The standard itself supports some very basic types like strings, integers, arrays, and structures. But there's no way of handling more complicated types. A project I've been working on involves manipulating server-side objects over a network connection, so I figured that XML-RPC would be a good place to start. I believe I've come up with a good way of allowing clients to build proxies for server side objects, while still being compatible with regular XML-RPC implementations. I've been experimenting a little bit with an XML-RPC server written in Python, and a JavaScript client (using jsolait). I considered adding a new type of parameter (<object>?), but decided against it since it would break older implementations. Instead, the client and server agree to a few conventions:
  • A two-tuple whose first element begins with "types." should be interpreted as a reference to an object on the server, where the first element specifies the type of the object, and the second is a unique identifier for that object.
  • The server exposes object methods as functions with names of the format "typeMethods.typeName.methodName".
When the client receives a two-tuple object reference, it can now look in the list of methods supported by the server, and create a new object with wrappers for all the appropriate class methods bound to that object. For example, the following python code: class MyClass: def doSomething(self, x, y): pass def makeObj(): return MyClass() would be exposed as these XML-RPC functions: makeObj() /* returns an object reference: ("types.MyClass", objectId) */ typeMethods.MyClass.doSomething(objectRef, x, y) /* objectRef should be ("types.MyClass", objectId) */ When the client sees a two-tuple of the form ("types.MyClass", objectId), it can create a new object along the lines of: var o = { "objectId" = objectId, "typeName" = "types.MyClass", "doSomething" = function(x,y) { typeMethods.MyClass.doSomething([this.typeName, this.objectId], x, y); }} (JavaScript isn't my strong suit, so I appologize if this isn't exactly right. Hopefully the intent is clear!) So now you've got a first-class object in your client, with methods that behave just like you would expect! You can now write: o.doSomething(x,y); instead of something along the lines of: serverproxy.MyClass_doSomething(objectId, x, y); Using the system.listMethods() function to get a list of all methods supported by the server enables you to bind all of a type's methods to an object. Generating objectId's is application specific, so I won't go into that here. I would like to see a generic way for a user to extend Python's SimpleXMLRPCServer to marshall and unmarshall new data types. The pickle methods (__getstate__, __setstate__) seem promising, but those are intended to serialize the entire representation of an object, not simply a reference to the object.

Mark Jaquith's Wordpress 2.0.3 upgrade

I have downloaded and installed Mark Jaquith's WordPress 2.0.3 Changed Files ZIP package. I have verified that the Changed Files ZIP package contains nothing that is not in the original WordPress 2.0.3 download, so it is safe to use as far as I can tell. Verifying this took more time than actually doing the upgrade! I think the WordPress release team should provide something similar, as this is a much more convienient way of upgrading for those of us with just FTP access to our web hosts.