Stuff I learned this weekend - vim, python and more!
Call me strange, but I actually enjoy spending time reading up on programming tools that I use regularly. I think of programming tools as tools in same way that a hammer or a saw is a tool. They both help you to get a job done. You need to learn how to use them properly. You need to keep tools well maintained. Sometimes you need to throw a tool away and get a new one. For my professional and personal programming I spend 99% of my time writing python with vim, and so I really enjoy learning more about them. Stuff I learned about vim: How I boosted my vim - lots of great vim tips (how did I not know about :set visualbell until now???) and plugins, which introduced me to... nerdtree - for file browsing in vim. It also reminded me to make use of the command-t plugin I had installed a while back. surround - for giving you the ability to work with the surroundings for text objects. Ever wanted to easily add quotes to a word, or change double quotes surrounding a string to single quotes? I know you have - so go install this plugin now! snipmate - lets you define lots of predefined snippets for various languages. Now in python I can type "def<tab>" and bam! I get a basic function definition. I wasn't able to get to PyCon US 2012 this year, so I'm very happy that the sessions were all recorderd. The art of subclassing - great tips on how to do subclassing well in python. why classes aren't always what you want - I liked how he emphasized that you should be always be open to refactoring your code. Usually making your own exception classes is a bad idea...however one great nugget buried in there was if you can't decide if you should raise a KeyError, AttributeError or TypeError (for example), make a class that inherits from all 3 and raise that. Then consumers can catch what makes sense to them instead of guessing. introduction to metaclasses - metaclasses aren't so scary after all! nice framework for building gevent services I liked the simple examples here. It introduces the ginkgo framework, which I'm hoping to have some time to play with soon.
Book review: PHP and MongoDB Web Development
I've been interested in mongodb for quite some time now, so when a co-worker of mine asked if I was interested in reviewing a book about mongodb, I of course said yes! She put me in touch with the publisher of a book on MongoDB and web development entitled, "PHP and MongoDB Web Development". I was given a electronic copy of the book to review, and so here are my thoughts after spending a few weeks reading it and playing around with mongodb independently. This book is subtitled "Beginner's Guide", and I think it achieves its goal of being a good introduction to mongodb for beginners. That being said, my primary criticism of the book is that it should include more information on some more advanced features like sharding and replica sets. It's easy to create web applications for small scales, or that don't need to be up 99.99% of the time. It's much harder to design applications that are robust to bursts in load, and to various kinds of network or hardware failures. Without much discussion on these points, it's hard to form an opinion on whether mongodb would be a suitable choice for developing large scale web applications given the information in this book alone. Other than that, I quite enjoyed the book and found it filled in quite a few gaps in my (limited) knowledge. Seeing full examples of working code on more complex topics like map reduce, GridFS and geospacial indexing is very helpful to understanding how these features of mongodb could be used in a real application. I found the examples to be a bit verbose at times, although that's more a fault of PHP than of the book I think, and the formatting in the examples was inconsistent at times. Fortunately all the examples can be downloaded from the publisher's web site, http://www.packtpub.com/support saving you from having to type it all in! The book also covers topics like integrating applications with traditional RDBMS like MySQL, and offers some practical examples of how mongodb could be used to augment an application which already is using SQL. It also includes helpful real world examples of how mongodb is used for web analytics, or by foursquare for 2d geospacial indexing. In summary, the book is a good introduction to mongodb, especially if you're familiar with php. If you're looking for more in-depth information about optimizing your queries, or scaling mongodb, or if your language of choice isn't php, this probably isn't a good book for you.
How RelEng uses mercurial quickly and safely
Release Engineering uses hg a lot. Every build or test involves code from at least one hg repository. Last year we started using some internal mirrors at the same time as making use of the hg share extension across the board, both of these had a big impact on the load on hg and time to clone/update local working copies. I think what we've done is pretty useful and resilient to various types of failure, so I hope this blog post is helpful for others trying to automate processes involving hg! The primary tool we're using for hg operations is called hgtool (available from our tools repo). Yes, we're very inventive at naming things. hgtool's basic usage is to be given the location of a remote repository, a local directory, and usually a revision. Its job is to make sure that the local directory contains a clean working copy of the repository at the specified revision. First of all, you don't need to worry about doing an 'hg clone' if the directory doesn't exist, or 'hg pull' if it does exist. This simplifies a lot of build logic! Next, we've build support for mirrors into hgtool. You can pass one or more mirror repositories to the tool with '--mirror', and it will attempt to pull/clone from the mirrors before trying to pull/clone from the primary repository. At Mozilla we have several internal hg mirrors that we use to reduce load on the primary public-facing hg servers. To improve the case when you need to do a full clone, we've added support for importing an hg bundle to initialize the local repository rather than doing a full clone from the mirror or master repositories. You can pass one or more bundle urls with '--bundle'. hgtool will download and import the bundle, and then pull in new changesets from the mirrors and master repositories. Finally, hgtool supports the 'hg share' extension. If you specify a base directory for shared repositories, all of the above operations will be run on a locally shared repository first, and then the working copy will be created with 'hg share', and updated to the correct revision. There are all kinds of fallback behaviours specified, like if you fail to import a bundle, try to clone from a mirror; then if you fail to clone from a mirror, try to clone from the master. These fallbacks have resulted in a far more resilient build process.
Book Review: A Meaningful World
What happens when you push - 2012 edition
Once upon a time, in 2009, I described what kind of things happened when you push to mozilla-central. At the time, we would trigger 111 distinct jobs that took over 40 hours of total compute time per push. How do things look now, at the beginning of 2012? We've more than doubled the number of jobs to 295 distinct jobs per push, and nearly tripled the amount of compute time spent to 110 hours per push! Here's the complete list of stuff that happens:
|
|
|
|
|
|
| |
|
|
|
|
|
|
Investigating hg performance
$ hg clone http://hg.mozilla.org try destination directory: try requesting all changes adding changesets adding manifests adding file changes added 95917 changesets with 447521 changes to 89564 files (+2446 heads) updating to branch default 53650 files updated, 0 files merged, 0 files removed, 0 files unresolvedNext I instrumented hg so I could get some profile information:
$ sudo vi /usr/local/bin/hg python -m cProfile -o /tmp/hg.profile /usr/bin/hg $*Then I timed out long it took me to check what would be pushed:
$ time hg out ssh://localhost//home/catlee/mozilla/try hg out ssh://localhost//home/catlee/mozilla/try 0.57s user 0.04s system 54% cpu 1.114 totalThat's not too bad. Let's check our profile:
import pstats
pstats.Stats("/tmp/hg.profile").strip_dirs().sort_stats('time').print_stats(10)
Fri Dec 9 00:25:02 2011 /tmp/hg.profile
38744 function calls (37761 primitive calls) in 0.593 seconds
Ordered by: internal time
List reduced from 476 to 10 due to restriction
ncalls tottime percall cumtime percall filename:lineno(function)
13 0.462 0.036 0.462 0.036 {method 'readline' of 'file' objects}
1 0.039 0.039 0.039 0.039 {mercurial.parsers.parse_index2}
40 0.031 0.001 0.031 0.001 revlog.py:291(rev)
1 0.019 0.019 0.019 0.019 revlog.py:622(headrevs)
177/70 0.009 0.000 0.019 0.000 {__import__}
6326 0.004 0.000 0.006 0.000 cmdutil.py:15(parsealiases)
13 0.003 0.000 0.003 0.000 {method 'read' of 'file' objects}
93 0.002 0.000 0.008 0.000 cmdutil.py:18(findpossible)
7212 0.001 0.000 0.001 0.000 {method 'split' of 'str' objects}
392/313 0.001 0.000 0.007 0.000 demandimport.py:92(_demandimport)
The top item is readline() on file objects? I wonder if that's socket operations. I'm ssh'ing to localhost, so it's really fast. Let's add 100ms latency:
$ sudo tc qdisc add dev lo root handle 1:0 netem delay 100ms $ time hg out ssh://localhost//home/catlee/mozilla/try hg out ssh://localhost//home/catlee/mozilla/try 0.58s user 0.05s system 14% cpu 4.339 total
import pstats
pstats.Stats("/tmp/hg.profile").strip_dirs().sort_stats('time').print_stats(10)
Fri Dec 9 00:42:09 2011 /tmp/hg.profile
38744 function calls (37761 primitive calls) in 2.728 seconds
Ordered by: internal time
List reduced from 476 to 10 due to restriction
ncalls tottime percall cumtime percall filename:lineno(function)
13 2.583 0.199 2.583 0.199 {method 'readline' of 'file' objects}
1 0.054 0.054 0.054 0.054 {mercurial.parsers.parse_index2}
40 0.028 0.001 0.028 0.001 revlog.py:291(rev)
1 0.019 0.019 0.019 0.019 revlog.py:622(headrevs)
177/70 0.010 0.000 0.019 0.000 {__import__}
13 0.006 0.000 0.006 0.000 {method 'read' of 'file' objects}
6326 0.002 0.000 0.004 0.000 cmdutil.py:15(parsealiases)
93 0.002 0.000 0.006 0.000 cmdutil.py:18(findpossible)
392/313 0.002 0.000 0.008 0.000 demandimport.py:92(_demandimport)
7212 0.001 0.000 0.001 0.000 {method 'split' of 'str' objects}
Yep, definitely getting worse with more latency on the network connection.
Oh, and I'm using a recent version of hg:
$ hg --version Mercurial Distributed SCM (version 2.0) $ echo hello | ssh localhost hg -R /home/catlee/mozilla/try serve --stdio 145 capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024This doesn't match what hg.mozilla.org is running:
$ echo hello | ssh hg.mozilla.org hg -R /mozilla-central serve --stdio 67 capabilities: unbundle lookup changegroupsubset branchmap stream=1So it must be using an older version. Let's see what mercurial 1.6 does:
$ mkvirtualenv hg16 New python executable in hg16/bin/python Installing setuptools... (hg16)$ pip install mercurial==1.6 Downloading/unpacking mercurial==1.6 Downloading mercurial-1.6.tar.gz (2.2Mb): 2.2Mb downloaded ... (hg16)$ hg --version Mercurial Distributed SCM (version 1.6) (hg16)$ echo hello | ssh localhost /home/catlee/.virtualenvs/hg16/bin/hg -R /home/catlee/mozilla/mozilla-central serve --stdio 75 capabilities: unbundle lookup changegroupsubset branchmap pushkey stream=1That looks pretty close to what hg.mozilla.org claims it supports, so let's time 'hg out' again:
(hg16)$ time hg out ssh://localhost//home/catlee/mozilla/try hg out ssh://localhost//home/catlee/mozilla/try 0.73s user 0.04s system 3% cpu 24.278 total
tl;dr
Finding missing changesets between two local repositories is 6x slower with hg 1.6 (4 seconds with hg 2.0 to 24 seconds hg 1.6). Add a few hundred people and machines hitting the same repository at the same time, and I imagine things can get bad pretty quickly. Some further searching reveals that mercurial does support a faster method of finding missing changesets in "newer" versions, although I can't figure out exactly when this change was introduced. There's already a bug on file for upgrading mercurial on hg.mozilla.org, so hopefully that improves the situation for pushes to try. The tools we use everyday aren't magical; they're subject to normal debugging and profiling techniques. If a tool you're using is holding you back, find out why!Signed builds coming soon to a nightly near you!
tl;dr: Starting soon (today I hope!) all nightly windows builds will be authenticode signed. Update mars will also be signed according to the new MAR format. the long version: RelEng have turned it up to 11 over the past few weeks to build up and deploy a new infrastructure to support the silent update program. One of the requirements for this project is that all binaries, including nightly builds and updates, be signed similar to how we already do signing for releases. Our current release signing process still requires some manual work, and it's not feasible to manually sign each nightly build every day. We've developed systems for integrating signing into the build process so that nightly builds will be fully signed before they get uploaded to FTP for our nightly users to download. Incremental builds will also be signed using a different (self-signed) certificate. Our plan is to enable signing on other platforms as soon as possible, where appropriate. Please let us know if you have any problems with builds or updates as a result of this. Details are mostly in bug 509158. I'm very proud of my team on this, especially bhearsum and rail. Thanks guys! Now we'll be dialing it back to 10, our normal resting state.
Christmas tree preparations with an Arduino
We usually get a real Christmas tree if we're going to be in town for Christmas. A real tree needs watering though, which is something we've been less than...consistent with over the past years. I decided to do something about this and build something to alert me when the water level gets too low. Two strips of aluminum foil taped to either side of a piece of plastic provide my water sensor. One strip is connected to an analog input on the arduino, and the other strip is connected to +3.3V. When the sensor is submerged I get a reading of around 300 "units" from the ADC. When it's removed from the water, a 10k pulldown resistor brings the reading down to 0. I've hooked up a tri-colour LED to indicate various states, and plan to have an audible alert as well. I'm not sure if the aluminum will end up corroding, nor if I'll be able to power this off batteries for any length of time. Still, I'm pretty pleased with it so far! Here you can see that LED is green when the sensor is submerged, and changes colours (like a traffic light, as per Thomas' request) when the sensor is removed.
A small battle won in the war on build times
On November 8th we landed some changes that changed the way we do checkouts from hg. We enabled the hg share extension on android builds and started using internal hg mirrors to pull/clone from instead of hitting the main hg.mozilla.org. The primary goal of this project was to reduce the load on the main hg server where developers often experience interrupted clones or slow pushes. If things got faster as a result, that would be a bonus. I'm pretty happy with the impact on checkout times, especially on try android builds! It seems like most of the gains came from enabling hg share on builds we weren't previously using them on since the update times for win32 try builds weren't affected; the only change for them would be pulling from the dedicated mirrors instead of the main hg.