I've been working on git support for Mozilla's build infrastructure over the past while.
RelEng's current model for creating our build machines is to create a homogeneous set of machines, each one capable of doing any type of build. This means that build machines can be doing a debug mozilla-central linux64 build one minute, and then an opt mozilla-b2g18 panda build the next. This model has worked well for us for scalability and load balancing: machines run on branches where there's currently activity, and we can focus our efforts on overall capacity. It's a huge improvement over the Firefox 2.x and 3.0 days where we had only one or two dedicated machines per branch!
However, there are challenges when using this model. We need to be reasonably efficient with disk space since the same disk is shared between all types of builds across all branches. We also need to make build setup time as quick as possible.
For mercurial based builds we've developed a utility called hgtool that manages shared local copies of repositories that are then locally cloned into each specific build directory. By using hg's share extension, we are able to save quite a bit of disk space and also have efficient pulling/cloning operations. However, because of the way Mozilla treats branches in hg (where each branch is in a separate repo), we need to keep each local clone of gecko branches separate from each other. We have a separate local clone of mozilla-central and another of mozilla-inbound. (although, maybe we could use bookmarks to keep them distinct in the same local repo...hmmm...)
What about git?
git's branches are quite different from mercurial's. Everything depends on "refs", which are names that refer to specific commit ids. It's actually possible to pull in completely unrelated git repositories into the same local repository. As long as you keep the refs from different repositories separate, all the commits and metadata coexist happily. In contrast, mercurial uses metadata on each commit to distinguish branches from each other. Most commits in the various gecko repositories are on the "default" branch. This makes it harder to combine mozilla-central and mozilla-inbound into the same repository without using some other means to keep each repository's "default" branch separate.
This gave me a crazy idea: what if we kept one giant shared git repository on the build machines that contained ALL THE REPOS.
We have a script called gittool that is the analog of hgtool. I modified it to support b2g-style xml manifests, and pointed it at a local shared repository and let it loose. It works!
I noticed that the more repositories were cloned into the shared repository, the longer it took to clone new ones. Repositories that normally take 5 seconds to clone were now taking 5 minutes to fetch the first time into the shared repository. What was going on?
One clue was that there are over 144,000 refs across all the b2g repositories. The vast majority of these are tags...there are only 12k non-tag refs.
I created a test script to create new repositories and measure how long it took to fetch them into a shared repository. The most significant factor affecting fetch speed was the number of refs! The new repositories each had 40 commits, and I created 100 of them. If I created one tag per commit then fetching a new repository into the shared repo added 0.0028s. However if I created 10 tags per commit then fetching the new repositories into the shared repo added 0.0113s per repo.
For now I've given up on the idea to have giant local shared repositories, and have moved to a model that keeps each repository separate, like we do with hg. Hopefully we can revisit this later to avoid having multiple copies of related repositories cloned on the same machine.