Skip to main content

Posts about technology (old posts, page 4)

Rebooting linux faster with kexec (and even faster with kexec-chooser!)

Somehow when reading through the Linux 2.6.17 changelog last week I came across a few articles discussing the kexec feature of recent Linux kernels. It's pretty neat, you can boot directly into another kernel image without having to go through a hardware / BIOS reboot. There's a Debian package called kexec-tools which gives you the ability to load these kernel images into memory and to boot into them. I found kexec a bit cumbersome to use, especially since all the kernels I care about booting into are the stock Debian kernels, and they all ship with ramdisk images that need to be used properly to boot. Using kexec by itself also requires that you have to manually bring the machine into a rebootable state first, or hack up some system scripts. You shouldn't just boot into a new kernel directly without shutting down devices, unmounting file systems, etc. So to scratch this itch, I wrote kexec-chooser. It's a small Python script that will allow you to easily warm-reboot into any of the stock Debian kernels installed on your system. It'll probably work with custom kernels as well, but I haven't tested that yet :) Downloads and more information can be found on the kexec-chooser page.

Spam Attack?

In the past few days I've had around a 100x increase in hits to my site...And also a barrage of spam comments coming in. The hits aren't coming from any one IP, but they don't seem to be referred by anywhere (Referrer header is empty). The User-Agent is typically "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)". I do use some JavaScript traffic tracking on this site, and these hits aren't showing up, which leads me to think that either hoards of IE users have JavaScript turned off, or this is some new distributed trojan spam attack. Anybody else noticed this recently? Technorati tags: spam blog blogging comment

Using server-side objects with XML-RPC

XML-RPC is a very handy little standard. It's straightforward, lightweight, and implementations exist for pretty much every language out there. One thing I've found a bit lacking with it however is that it's kind of a pain to deal with objects. The standard itself supports some very basic types like strings, integers, arrays, and structures. But there's no way of handling more complicated types. A project I've been working on involves manipulating server-side objects over a network connection, so I figured that XML-RPC would be a good place to start. I believe I've come up with a good way of allowing clients to build proxies for server side objects, while still being compatible with regular XML-RPC implementations. I've been experimenting a little bit with an XML-RPC server written in Python, and a JavaScript client (using jsolait). I considered adding a new type of parameter (<object>?), but decided against it since it would break older implementations. Instead, the client and server agree to a few conventions:
  • A two-tuple whose first element begins with "types." should be interpreted as a reference to an object on the server, where the first element specifies the type of the object, and the second is a unique identifier for that object.
  • The server exposes object methods as functions with names of the format "typeMethods.typeName.methodName".
When the client receives a two-tuple object reference, it can now look in the list of methods supported by the server, and create a new object with wrappers for all the appropriate class methods bound to that object. For example, the following python code: class MyClass: def doSomething(self, x, y): pass def makeObj(): return MyClass() would be exposed as these XML-RPC functions: makeObj() /* returns an object reference: ("types.MyClass", objectId) */ typeMethods.MyClass.doSomething(objectRef, x, y) /* objectRef should be ("types.MyClass", objectId) */ When the client sees a two-tuple of the form ("types.MyClass", objectId), it can create a new object along the lines of: var o = { "objectId" = objectId, "typeName" = "types.MyClass", "doSomething" = function(x,y) { typeMethods.MyClass.doSomething([this.typeName, this.objectId], x, y); }} (JavaScript isn't my strong suit, so I appologize if this isn't exactly right. Hopefully the intent is clear!) So now you've got a first-class object in your client, with methods that behave just like you would expect! You can now write: o.doSomething(x,y); instead of something along the lines of: serverproxy.MyClass_doSomething(objectId, x, y); Using the system.listMethods() function to get a list of all methods supported by the server enables you to bind all of a type's methods to an object. Generating objectId's is application specific, so I won't go into that here. I would like to see a generic way for a user to extend Python's SimpleXMLRPCServer to marshall and unmarshall new data types. The pickle methods (__getstate__, __setstate__) seem promising, but those are intended to serialize the entire representation of an object, not simply a reference to the object.

Mark Jaquith's Wordpress 2.0.3 upgrade

I have downloaded and installed Mark Jaquith's WordPress 2.0.3 Changed Files ZIP package. I have verified that the Changed Files ZIP package contains nothing that is not in the original WordPress 2.0.3 download, so it is safe to use as far as I can tell. Verifying this took more time than actually doing the upgrade! I think the WordPress release team should provide something similar, as this is a much more convienient way of upgrading for those of us with just FTP access to our web hosts.

Quest for the Perfect Storage Solution!

I spent a little bit of time today looking into some of the storage solution choices out there.  I was mainly concerned with systems supported under Linux, but I didn't limit myself to just those.

My quest started this morning when I read a post on Gizmodo: Buffalo DriveStation: Serial ATA, Fanless Design. I had heard of Buffalo Technology before, I've often considered buying one of their products, or something like it.  What could be better than a 2 Terabyte box that you just plug into your network and configure?  Well...The Perfect Storage Solution of course!

A few simple use cases may describe what I'm looking for.  First, if a drive fails, I want to be able to replace it with no downtime, loss of data, and using any drive of sufficient size (at least as large as the one that failed) that I have on hand.  Second, if I'm running out of free space, I want to be able to add a brand new drive and start using it.  Third, if there is no more physical room for a new drive, I want to be able to migrate data off of the one of the drives (probably the smallest/oldest one), to make room for a newer, larger drive.  I suppose this is a direct consequence of satisfaction of the first requirement.

Out of these simple cases, I can distill a few must-have features:

  • Scalable! I should be able to add more space to this thing with a minimum of hassle.  I consider having to unmount a filesystem, grow the partition, then grow the filesystem a hassle.  I also consider having to get drives of exactly the same size as the drives already in there a hassle too.  I want to add more space, not replace the space I have!  LVM comes close to achieving this.
  • Fault tolerant: The Perfect Storage Solution should be able to handle at least one of the drives failing.  Better would be the ability to handle n drive failures.  RAID-5/RAID-6 work well here, but fail the scalability requirement.
  • Cheap!  I shouldn't need proprietary hardware/software for this.  I should just be able to add another drive to my enclosure and start using the space.  Or at the worst, buy another enclosure and add the new drive to the new enclosure :)
Some other nice features would be things like:
  • Snapshot support: great for backups, or when doing some kinds of admin work.
  • Very large filesystem support.  It's very easy to get more than a few terabytes of data (legitimately!) these days :)
And as long as I'm writing a wish list:
  • Transparent compression
  • Transparent encryption
  • Clustering (create one big pool of storage from drives scattered over a network)
So what's wrong with the devices like the one above?  Basically they don't scale well.  There's no nice way to integrate these things into one big pool of disk space.  You need to mess around with mount points, and put symlinks all over the place...unless you use LVM.

What's wrong with LVM?  It's not fault tolerant.  Sure, you can run LVM on top of a bunch of RAID devices.  But that means to add more storage in a fault tolerant way, you need to add a whole new RAID array since it's not really possible (as far as I know, somebody please correct me on this if I'm wrong!) to add a single drive into an existing RAID array.

Wikipedia's RAID page mentions the idea of a "write hole", and refers the reader to Jeff Bonwick's post on RAID-Z. The concept of the "write hole" does make some sense to me; basically if the drives lose power or crash while writing the parity data, then the data blocks and parity blocks may be inconsistent. It's not clear to me how RAID-Z solves this, and why you can't check those blocks when you restore power (especially when using some kind of journaled filesystem...although I suppose that the journal's parity data may have been corrupted as well!), but certainly data integrity is an issue that the Perfect Storage Solution must address!

I thought Sun's ZFS was promising for a little while.  But it seems that it doesn't handle drives of different sizes any better than RAID-5/6 does.  It's also unlikely that it will ever be available to Linux users because of licensing issues.

It certainly seems like LVM is the closest to what I'm looking for.  If only it managed parity data across physical blocks!  Then just polish up the ext3 online resizing functionality and life would be great!  Or maybe reiserfs/jfs/xfs would work better for a resizable filesystem?

Technorati Tags: , , ,

Re: Re: Ruby and Python compared

On his blog, Ian Bicking responds to the article, Ruby and Python compared. While there is much in the latter that is uninformed as to what Python is capable of, the most important point I got from Ian's post was:

An important rule in the Python community is: we are all consenting adults. That is, it is not the responsibility of the language designer or library author to keep people from doing bad things. It is their responsibility to prevent people doing bad things accidentally. But if you really want to do something bad, who are we to say you are wrong? It's your program. Maybe you even have a good reason.
I think this should be the motto of any module developer: "Keep people from doing bad things accidentally." It's impossible to keep a developer from shooting himself in the foot if he really wants to, so don't try too hard. Your job is to enable users of your code, not restrict them. I've heard many C++ / Java programmers complain that Python isn't object oriented because it doesn't offer private/protected data for classes. In a perfect world all libraries and modules would be perfectly designed and there would be no need to go mucking with the internals of a module you didn't write. Back here in the real world, APIs are often not as well thought out as they should be. In Python (and in Ruby as well I'm guessing) you can muck about with the internals of classes or objects if you have to. It's either that or get the upstream package fixed and distributed everywhere before you can deploy your application.

Kill zem all!

Wow! Every good linux user knows about the kill command. You use it all the time to kill off out of control or dead processes.

kill 1234
will kill off the process with pid 1234. But did you know that
kill -TERM -1234
will kill off all processes in 1234's process group? I didn't until just a few minutes ago! Super-handy!

Disowning your children (in bash/zsh)

You learn something new every day! Today I learned about the disown builtin in bash and zsh. When you disown a job, it will no longer receive a HUP signal when you exit your shell.

catlee@sherwood:~ [1015]% while true; do date >> date.log; sleep 10; done&

[1] 7380

So now you've got the current date being appended to date.log every 10 seconds. Try exiting your shell:

catlee@sherwood:~ [1016%1]% exit

zsh: you have running jobs.

But use the disown command:

catlee@sherwood:~ [1017%1]% disown %1

catlee@sherwood:~ [1018]% exit

and your shell exits without complaining. Meanwhile, your job keeps running in the background!