Skip to main content

100 Push Ups: Week 5, Day 1 (take 3...)

I did it! Sort of... Last night I completed all the sets for the first day of week 5...but I rested significantly longer than one minute between sets....mostly because I was trying to watch 24 on my laptop, and global's website wasn't cooperating. But, I'll count it as a success, and move onto day 2 tomorrow. I also hit a milestone in my running training today - I passed the 5 min/km mark! It was just on a 3km run, but I'm really happy about it, especially since I beat it handily, finishing 3km in 14:20 for an average pace of 4:47 min/km.

Release Engineering Sheriffs

Picking up a thread that was being discussed last year about suggested changes to sheriffing... Starting next week (on the 17th), there will be one person from release engineering designated to be the RelEng Sheriff for the week. This person will be responsible for various duties, most important of which will be to be available in #developers to help the developer sheriff track down issues with build machines, test failures, or other infrastructure problems. For more information, and for the current schedule of RelEng Sheriffs, please see the ReleaseEngineering:Sheriffing wiki page.

Clobbering the trees

Today we landed some changes that will give developers self-serve clobber ability on our Mozilla Central / Mozilla 1.9.1 / Tracemonkey infrastructure. In our current infrastructure, we have a large pool of slave machines for each platform that each build all the various branches. This makes it nice and easy to spin up new project and release branches, and automatically distributes jobs across branches. However, it can sometimes be confusing when tracking down a build or test failure. Sometimes, a particular machine needs to have its build directory cleaned out; and sometimes all the machines for one branch or build type need to be cleaned up. Until now, this could only be done by RelEng by accessing the build machines directly. But now you can do it too! If you've got a valid LDAP account, head on over to http://build.mozilla.org/clobberer. You'll see a giant table, with lots of checkboxes on it. If you check a box next to one of the slaves on a particular branch / builder, then the next time that slave runs a build on that branch, it will first delete the entire build directory, and then do a fresh checkout, and continue on with the rest of the build. Selecting a builder-level checkbox merely selects all the slaves for that builder, and similarly, selecting the branch-level checkbox selects all the slaves for all the builders in that branch. In addition, if a slave has not been clobbered in a configurable time period (currently set to 1 week), it will clobber on the next run. Slaves are added to the database as they report in to ask for their clobber data, so it could take a little while for all the slave / builder / branch combinations to show up. See bug 432236 for more information.

Upgraded to Wordpress 2.7

I just spent a few minutes upgrading my blog to wordpress 2.7. Looks like everything went smoothly! I did this upgrade with mercurial queues again. Wordpress 2.7 is supposed to have better upgrade support built in, so I may not need mercurial for future upgrades. Please let me know if you notice anything strange or missing since the upgrade.

100 Push Ups: Week 5

I've been a bit remiss in doing updates on my progress for the 100 push up challenge. I've twittered some of my progress, but mostly I've been completely silent about it. I completed week 4 a few weeks ago, pretty handily I think. I managed 46 push ups on my post-week-4 progress test. And now I'm stuck in week 5. Day 1's series is supposed to be 36,40,30,24,>=40. So far I've managed to do 36, 24 with a short break then 16, 30, 24, 27 with a short break then 13...which is technically all the push-ups, but not in the required order. I'll keep working on it until I can complete the day's series properly. This week as been pretty busy at work, so I haven't been out running since Sunday, when I did my 18km long run. My legs are starting feel a bit...jumpy, like they have a lot of pent up energy! Hopefully I can get a quick run in tonight, then 8km tomorrow and 21km on Sunday. It's a bit scary to think I'll be running a half marathon as part of a training routine.

Automated Talos Analysis

As part of one of our goals in Release Engineering this quarter, I'm investigating whether we can automatically detect variance in Talos performance data. Automatically detecting these changes in performance results would be a great help to developers and tree sheriffs. Imagine if the Tinderbox tree could be made to burn if a performance regression was detected? There are lots of possibilities if we can get this working: regressions could cause the tree to burn, firebot could spam #developers with information, try-talos data could be compared more easily to the baseline data, or we could automatically back out changes that cause regressions! :P This is also exciting, because it allows us to consider moving towards a pool-o'-slaves model for the Talos machines, just like we have for build and unittests right now. Having Talos use a pool-o'-slaves allows us to scale to additional project / release branches much more quickly, and allows us to be more flexible in allocating machines across branches. I've spent some time over the past few weeks playing around with data from graph server, bugging Johnathan, and having fun with flot, and I think I've come up with a workable solution.

How it works

I grab all the data for a test/branch/platform combination, and merge it into a single data series, ordered by buildid (the closest thing we've got right now to being able to sort the data in the same order in which changes landed). Individual data points are classified into one of four buckets:
  • "Good" data. We think these data points are within a certain tolerance of the expected value. Determining what the expected value is a bit tricky, so read on!
  • "Spikes". These data points are outside of the specified tolerance, but don't seem to be part of an ongoing problem (yet). Spikes can be caused by having the tolerance set too low, random machine voodoo, or not having enough data to make a definitive call as to if it's a code regression or machine problem.
  • "Regressions". When 3 or more data points are outside of the tolerance in the same direction, we assume this is due to a problem with the code, and flag it as a regression.
  • "Machine problem". When the last 2 data points from the same machine have been outside of the tolerance, then we assume this is due to a problem with the machine.
For the purposes of the algorithm (and this post!), a regression is a deviation from the expected value, regardless of it's a performance gain or loss. At this point the tolerance criteria is being set semi-manually. For each test/branch/platform combination, the tolerance is set as a certain number of standard deviations. The expected value is then determined by going back in the performance data history and looking for a certain sized window of data where no point is more than the configured number of standard deviations from the average. This can change over time, so we re-calculate the expected value at each point in the graph.

Initial Results

As an example, here's how data from Linux Tp3 tests on the Mozilla 1.9.2 branch is categorized: Linux Tp3 Data for Mozilla 1.9.2 Or, if you have a canvas-enabled browser, check out this interactive graph. A window size of 20 and a standard deviation threshold of 2.5 was used here for this data set. The green line represents all the good data. The orange line (which is mostly hidden by the green line), represents the raw data from the 3 Linux machines running that test. The orange circles represent spikes in the data, red circles represent regressions, and blue circles represent possible machine problems. For the most part we can ignore the spikes. Too many spikes probably means we need to tighten our tolerance a bit There are two periods of to take notice of on this graph:
  • Jan 12, around noon, a regression was detected. Two orange spike circles are followed by three red regression circles. Recall that we wait for the 3rd data point to confirm an actual regression.
  • Jan 30, around noon, a similar case. Two orange spike circles, followed by regression points.
Although in these cases, the regression was actually a win in terms of performance, it shows that the algorithm works. The second regression is due to Alice unthrottling the Talos boxes. In both cases, a new expected value is found after the data levels off again. The analysis also produces some textual output more suitable for e-mail, nagios or irc notification, e.g.: Regression: Tp3 decrease from 417.974 to 235.778 (43.59%) on Fri Jan 30 11:34:00 2009. Linux 1.9.2 build 20090130083434 http://graphs.mozilla.org/#show=395125,395135,395166&sel=1233236074,1233408874 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7f5292b5b9e2&tochange=f1493cf102b9 My code can be found on http://hg.mozilla.org/users/catlee_mozilla.com/talos-grokker. Patches or comments welcome!

100 Push Ups: Week 3, Day 2

Successfully completed day 2 today: 20, 25, 15, 15, >= 25 (I managed 30). Today was supposed to be a 3k easy run for the Around the Bay training. I was feeling pretty energetic, so decided to stick to the 3k distance, but to go a bit harder. I programmed my garmin to do a 3km, 5:15min/km workout, and off I went into the -15 weather! My route has a few small hills in it, so I thought it would make for a good workout. I managed to do it in 15:48, which works out to 5:16min/km. I'll call that close enough to count :) Next week I'll try to hit 5:10.

A whole lot better?

If this article from getfitslowly.com is to be believed, then I shouldn't have put my 100 push up challenge on hold while I had a cold. My dad always said (about exercise, among other things), "It'll either make you feel a whole lot better, or a whole lot worse." Maybe this is a truism, but my experience has been that exercise while sick usually does make me feel better. Especially running. Maybe that's because I'm more accustomed to running than to weight training or push ups? So I guess my month-long break can be chalked up to laziness :P

100 Push Ups - Day 3, Week 1

Ok, back on the wagon for real now! After working my way up to week 4 in December, then falling sick with a cold or two, I haven't been doing push ups regularly for a few weeks now. I decided to start at week 3 again. So today I did my 5 sets of 14, 18, 14, 14, >=20 with a minute rest between sets. I managed to complete 26 on the last set, which gives me some confidence that I haven't lost all progress I had made up to December! In other exercise news, I've registered for the 2009 Around the Bay Race. This is a 30km run around the Hamilton Bay on March 29th. I'm trying to follow their training program, and so far so good. I did my 14km run yesterday instead of on Sunday, and it went pretty well, except for the last 1.5km which was on a trail that was packed snow. Man, I had no idea running on snow was so hard! It's a great workout for sure! I've been using Map My Run to track my progress and routes and such. It's pretty cool, although I wish they would let you graph your pace over time. I want to know if I'm getting faster or slower!