Parallelizing Unit Tests
Last week we flipped the switch and turned on running unit tests on packaged builds for our mozilla-1.9.1, mozilla-central, and tracemonkey branches.
What this means is that our current unit test builds are uploaded to a web server along with all their unit tests. Another machine will then download the build and tests, and run various test suites on them.
Splitting up the tests this way allows us to run the test suites in parallel, so the mochitest suite will run on one machine, and all the other suites will be run on another machine (this group of tests is creatively named 'everythingelse' on Tinderbox).
Splitting up the tests is a critical step towards reducing our end-to-end time, which is the total time elapsed between when a change is pushed into one of the source repositories, and when all of the results from that build are available. Up until now, you had to wait for all the test suites to be completed in sequence, which could take over an hour in total. Now that we can split the tests up, the wait time is determined by the longest test suite. The mochitest suite is currently the biggest chunk here, taking somewhere around 35 minutes to complete, and all of the other tests combined take around 20 minutes. One of the next steps for us to do is to look at splitting up the mochitests into smaller pieces.
For the time being, we will continue to run the existing unit tests on the same machine that is creating the build. This is so that we can make sure that running tests on the packaged builds is giving us the same results (there are already some known differences: bug 491675, bug 475383)
Parallelizing the unit tests, and the infrastructure required to run them, is the first step towards achieving a few important goals.
Reducing end-to-end time.
Running unit tests on debug, as well as on optimized builds. Once we've got both of these going, we can turn off the builds that are currently done solely to be able to run tests on them.
Running unit tests on the same build multiple times, to help isolate intermittent test failures.
All of the gory details can be found in bug 383136.