Posts about hg

Experiments with shared git repositories

I've been working on git support for Mozilla's build infrastructure over the past while.

RelEng's current model for creating our build machines is to create a homogeneous set of machines, each one capable of doing any type of build. This means that build machines can be doing a debug mozilla-central linux64 build one minute, and then an opt mozilla-b2g18 panda build the next. This model has worked well for us for scalability and load balancing: machines run on branches where there's currently activity, and we can focus our efforts on overall capacity. It's a huge improvement over the Firefox 2.x and 3.0 days where we had only one or two dedicated machines per branch!

However, there are challenges when using this model. We need to be reasonably efficient with disk space since the same disk is shared between all types of builds across all branches. We also need to make build setup time as quick as possible.

For mercurial based builds we've developed a utility called hgtool that manages shared local copies of repositories that are then locally cloned into each specific build directory. By using hg's share extension, we are able to save quite a bit of disk space and also have efficient pulling/cloning operations. However, because of the way Mozilla treats branches in hg (where each branch is in a separate repo), we need to keep each local clone of gecko branches separate from each other. We have a separate local clone of mozilla-central and another of mozilla-inbound. (although, maybe we could use bookmarks to keep them distinct in the same local repo...hmmm...)

What about git?

git's branches are quite different from mercurial's. Everything depends on "refs", which are names that refer to specific commit ids. It's actually possible to pull in completely unrelated git repositories into the same local repository. As long as you keep the refs from different repositories separate, all the commits and metadata coexist happily. In contrast, mercurial uses metadata on each commit to distinguish branches from each other. Most commits in the various gecko repositories are on the "default" branch. This makes it harder to combine mozilla-central and mozilla-inbound into the same repository without using some other means to keep each repository's "default" branch separate.

This gave me a crazy idea: what if we kept one giant shared git repository on the build machines that contained ALL THE REPOS.

combine ALL THE REPOS!

We have a script called gittool that is the analog of hgtool. I modified it to support b2g-style xml manifests, and pointed it at a local shared repository and let it loose. It works!


I noticed that the more repositories were cloned into the shared repository, the longer it took to clone new ones. Repositories that normally take 5 seconds to clone were now taking 5 minutes to fetch the first time into the shared repository. What was going on?

One clue was that there are over 144,000 refs across all the b2g repositories. The vast majority of these are tags...there are only 12k non-tag refs.

I created a test script to create new repositories and measure how long it took to fetch them into a shared repository. The most significant factor affecting fetch speed was the number of refs! The new repositories each had 40 commits, and I created 100 of them. If I created one tag per commit then fetching a new repository into the shared repo added 0.0028s. However if I created 10 tags per commit then fetching the new repositories into the shared repo added 0.0113s per repo.

For now I've given up on the idea to have giant local shared repositories, and have moved to a model that keeps each repository separate, like we do with hg. Hopefully we can revisit this later to avoid having multiple copies of related repositories cloned on the same machine.

Exporting MQ patches

I've been trying to use Mercurial Queues to manage my work on different tasks in several repositories. I try to name all my patches with the name of the bug it's related to; so for my recent work on getting Talos not skipping builds, I would call my patch 'bug468731'.

I noticed that I was running this series of steps a lot:

cd ~/mozilla/buildbot-configs

hg qdiff > ~/patches/bug468731-buildbot-configs.patch

cd ~/mozilla/buildbotcustom

hg qdiff > ~/patches/bug468731-buildbotcustom.patch

...and then uploading the resulting patch files as attachments to the bug. There's a lot of repetition and extra mental work in those steps:

  • I have to type the bug number manually twice. This is annoying, and error-prone. I've made a typo on more than one occasion and then wasted a few minutes trying to track down where the file went.
  • I have to type the correct repository name for each patch. Again, I've managed to screw this up in the past. Often I have several terminals open, one for each repository, and I can get mixed up as to which repository I've currently got active.
  • mercurial already knows the bug number, since I've used it in the name of my patch.
  • mercurial already knows which repository I'm in.

I wrote the mercurial extension below to help with this. It will take the current patch name, and the basename of the current repository, and save a patch in ~/patches called [patch_name]-[repo_name].patch. It will also compare the current patch to any previous ones in the patches directory, and save a new file if the patches are different, or tell you that you've already saved this patch.

To enable this extension, save the code below somewhere like ~/.hgext/, and then add "mkpatch = ~/.hgext/" to your .hgrc's extensions section. Then you can run 'hg mkpatch' to automatically create a patch for you in your ~/patches directory!

import os, hashlib

from mercurial import commands, util

from hgext import mq

def mkpatch(ui, repo, *pats, **opts):
    """Saves the current patch to a file called -.patch
    in your patch directory (defaults to ~/patches)
    repo_name = os.path.basename(ui.config('paths', 'default'))
    if opts.get('patchdir'):
        patch_dir = opts.get('patchdir')
        del opts['patchdir']
        patch_dir = os.path.expanduser(ui.config('mkpatch', 'patchdir', "~/patches"))

    ui.pushbuffer(), repo)
    patch_name = ui.popbuffer().strip()

    if not os.path.exists(patch_dir):
    elif not os.path.isdir(patch_dir):
        raise util.Abort("%s is not a directory" % patch_dir)

    mq.diff(ui, repo, *pats, **opts)
    patch_data = ui.popbuffer()
    patch_hash ='sha1', patch_data).digest()

    full_name = os.path.join(patch_dir, "%s-%s.patch" % (patch_name, repo_name))
    i = 0
    while os.path.exists(full_name):
        file_hash ='sha1', open(full_name).read()).digest()
        if file_hash == patch_hash:
            ui.status("Patch is identical to ", full_name, "; not saving")
        full_name = os.path.join(patch_dir, "%s-%s.patch.%i" % (patch_name, repo_name, i))
        i += 1

    open(full_name, "w").write(patch_data)
    ui.status("Patch saved to ", full_name)

mkpatch_options = [
        ("", "patchdir", '', "patch directory"),
cmdtable = {
    "mkpatch": (mkpatch, mkpatch_options + mq.cmdtable['^qdiff'][1], "hg mkpatch [OPTION]... [FILE]...")