Python Warts, part 2 - the infamous GIL

chris

2006-06-27 14:11

I'm going to come out and say that the global interpreter lock (GIL) in Python bothers me. For those who don't know, the GIL in the C implementation of Python allows only one thread to be running Python code at any one time. Extension modules executing C/C++/!Python code can release the GIL so that other threads can run, but this doesn't apply in general to regular Python code. Ian Bicking posted a while back about the GIL of Doom. Granted, his post was originally written in October 2003, so things have changed a bit since then. I believe the main thrust of his argument was that there are only a few cases where the GIL would really get in your way. Those cases are basically where you are doing some CPU bound task that isn't easily separated into separate processes, and is running on a multi-processor machine. The way to get around the GIL in Python is to split up your application into separate processes, and use some kind of inter-process communication (IPC) mechanism to transfer work/results between processes. The message seems to be: "You don't really want to use a shared address space threading model, do you? I'm sure you'd much rather just use a separate process. Everybody knows that's better." Suggestions such as calling time.sleep(0), or fiddling around with sys.setcheckinterval are hacky, and clutter up your code for no good reason other than to work around deficiencies of the interpreter. Yes, sharing an address space with multiple threads of execution can be tricky. But IPC is no picnic either. Starting a new process can be expensive. os.fork() isn't available on all platforms. There is, AFAIK, no portable shared memory module for python (POSH seems to be dead?), so to send data between processes you need to set up a socket, pipe or use temporary files, leading to extra code (more to write, more to read and understand afterwards), setup overhead (system calls aren't free), performance impact (serializing data isn't free), and room for buggy implementations (did you clean up your temporary files? did you close your socket? did you set restrictive permissions on your socket file?) In many ways, threading is much simpler. It's simple to set up, has low overhead, no data copying costs, and is self-contained in the process so you're not leaking out-of-process resources (sockets files, bound addresses, temporary files, etc.). Python has never been the type of language to prevent the developer from doing "unsafe" things, that's why there aren't really private members on classes. Ian Bicking again writes (in a different post),

An important rule in the Python community is: we are all consenting adults. That is, it is not the responsibility of the language designer or library author to keep people from doing bad things. It is their responsibility to prevent people doing bad things accidentally. But if you really want to do something bad, who are we to say you are wrong? It's your program. Maybe you even have a good reason.

Python's GIL is getting in my way. Yes I can do bad things with multiple threads sharing one address space, but that should be my problem, not a restriction of the language implementation. With multi-core CPUs becoming more and more common, and not only in the server domain, I think this will become more and more of an issue for Python. In the short term, some slick IPC would be nice, but in the long term a truly multithreaded Python interpreter would benefit everybody. Talk is cheap, I know...code is what counts here. Maybe a PEP or SIG could be started to flesh out what would be required to get this accomplished for Python 3000.

Comments