Digging into Firefox update sizes
Update sizes up 20-37% since last year!
Mozilla relies on our automatic update infrastructure to make sure that our users are kept up to date with the latest, most secure and fastest browser.
Having smaller updates means users are able to get the latest version of Firefox more quickly.
Since Firefox 19.0 (released just over a year ago - February 16th, 2013) our complete update size for Windows has grown from 25.6MB to 30.9MB for Firefox 28.0 (released March 15th, 2014). That's a 20% increase in just one year. On OSX it's grown from 37.8MB to 47.8MB, a 26% increase.
Partial updates have similarly grown. For Windows, a user coming form 27.0.1 to 28.0 would receive a 14.3MB update compared to a 10.4MB update going from 18.0.2 to 19.0, an increase of 37.5%
This means it's taking users 20-37% longer to download updates than it did last year. Many of our users don't have fast reliable internet, so an increase in update size makes it even harder for them to stay up to date. In addition, this size increase translates directly into bandwidth costs to Mozilla. All else being equal, we're now serving 20-37% more data from our CDNs for Firefox updates this year compared to last year.
Reducing update size
How can we reduce the update size? There are a few ways:
Make sure we're serving partial updates to as many users as possible, and that these updates are applied properly. More analysis is needed, but it appears that only roughly half of our users are updating via partial updates.
Reduce the amount of code we ship in the update. This could mean more features and content are distributed at runtime as needed This is always a tricky trade-off to make between having features available for all users out of the box, and introducing a delay the first time the user wants to use a feature that requires remote content. It also adds complexity to the product.
Change how we generate updates. This is going to be the focus of the rest of my post.
Smaller updates are more better
There are a few techniques I know of to reduce our update sizes:
Use xz compression instead of bzip2 compression inside the MAR files (bug 641212). xz generally gets better compression ratios at the cost of using more memory.
Use courgette instead of bsdiff for generating the binary diff between .exe and .dll files (bug 504624). Courgette is designed specifically for diffing executables, and generates much smaller patches than bsdiff.
Handle omni.ja files more effectively (bug 772868). omni.ja files are essentially zip files, and are using zip style compression. That means each member of the zip archive is individually compressed. This is problematic for two reasons: it makes generating binary diffs between omni.ja files much less effective, and it makes compressing the omni.ja file with bzip2 or xz less effective. You get far better results packing files into a zip file with 0 compression and using xz to compress it afterwards. Similarly for generating diffs, the diff between two omni.ja files using no zip compression is much smaller than the diff between two omni.ja files using the default zip -9 compression.
Don't use per-file compression inside the MAR file at all, rather compress the entire archive with xz. I simulated this by xz-compressing tar files of the MAR contents.
27% smaller complete updates
We can see that using xz alone saves about 10.9%. There's not a big difference between xz -6 and xz -9e, only a few kb generally. ("xz -6" and "xz -9e" series in the chart)
Disabling zip compression in the omni.ja files and using the standard bzip2 compression saves about 9.7%. ("zip0 .ja" in the chart)
Combining xz compression with the above yields a 24.8% saving, which is 7.6MB. ("zip0 .ja, xz -9e" in the chart)
Finally, disabling zip compression for omni.ja and per-file compression and compressing the entire archive at once yields a 27.2% saving, or 8.4MB.
66% smaller partial updates
Very similar results here for xz, 8.4% savings with xz -9e.
Disabling zip compression in the omni.ja files has a much bigger impact for partial updates because we're able to produce a much smaller binary diff. This method alone is saves 42%, or 6.0MB.
Using courgette for executable files yields a 19.1% savings. ("courgette" in the chart)
Combining courgette for executable files, no zip level compression, and per-file xz compression reduces the partial update by 61%. ("courgette, zip0 .ja, xz -9e" in the chart)
And if we compress the entire archive at once rather than per-file, we can reduce the update by 65.9%. ("courgette, zip0 .ja, tar, xz -9e" in the chart)
Some notes on my methodology: I'm focusing on 32-bit Windows, en-US only. Most of the optimizations, with the exception of courgette, are applicable to other platforms. I'm measuring the total size of the files inside the MAR file, rather than the size of the MAR file itself. The MAR file format overhead is small, only a few kilobytes, so this doesn't significantly impact the results, and significantly simplifies the analysis.
Finally, the code I used for analysis is here.