[TDF infra announce/request] reduce downstream traffic by 50× on git clones with `git config protocol.version 2`

Guilhem Moulin guilhem at libreoffice.org
Sat Jul 4 04:26:05 UTC 2020

TL;DR: run `git config protocol.version 2` in your local clones of the
core and online repositories.  That should reduce downstream traffic by
over 50x (from 9MiB to 150kiB) in no-op `git fetch` commands.
Client-side this only applies to git ≥2.18.0.

Dear developers,

We just upgraded https://gerrit.libreoffice.org to Gerrit v3.1.  Thanks
to upstream work by David Ostrovsky and others this version adds support
for version 2 of the git wire protocol [0], which significantly improves
performance of `git fetch` commands on repositories with many references
such as ours.  That's something we had our eyes on for a while [1].

core currently has over 300k references, the overwhelming majority of
which from changesets (each changeset creates n+1 refs, where n is the
number of its revisions).  In version 1 of the git wire protocol the
entire reference list, along with the commit IDs they point to, is sent
to client in the initial server advertisement (as can be seen by adding
‘GIT_TRACE_PACKET=1’ to the environment of a git-fetch(1) command).
This causes significant downstream traffic, especially because the
commit IDs makes compression moot.  On our large repositories this is
even a noticeable bottleneck.  In contrast, with version 2 only requested
refs are being sent, thereby significantly reducing downstream traffic
when the repository is already up to date.  Here are some metrics for a
fresh core clone:

  * `git fetch` yields an advertisement of size 8666kiB, and takes over
    3s to complete (in the same datacenter); while
  * `git -c protocol.version=2 fetch` yields a response of size 150k and
    takes under 0.5s to complete.

That's over 50× smaller!  Of course on an “active” clone the initial
advertisement won't be empty as it'll likely contain refs for the recent
changesets you're working on, but certainly not the 300k (and counting)
references.  So most likely a two digit factor improvement :-)

The performance gain is visible for other transports as well, so this
applies to ssh://gerrit.libreoffice.org:9418 too (and even to the now
deprecated git://git.libreoffice.org).

Please run

    git config protocol.version 2

in your local clones to force the client to use version 2 of the wire
protocol.  Or even globally (it will fallback to the original wire
protocol if the server doesn't support it) with

    git config --global protocol.version 2

To check if that worked, `GIT_TRACE_PACKET=1 git fetch 2>&1 | head`
should produce the following output

    pkt-line.c:80           packet:          git< version 2
    pkt-line.c:80           packet:          git< ls-refs
    pkt-line.c:80           packet:          git< 0000
    pkt-line.c:80           packet:        fetch< version 2
    pkt-line.c:80           packet:        fetch< ls-refs

This is a wire protocol, so the change is only relevant for transport.
No need to re-create existing clones.

I suppose git upstream will change the default protocol.version at some
point, but given the above I see no reason to wait.

Guilhem, for The Document Foundation's infrastructure team.

PS: Please preserve the recipient list in replying.

[0] https://git-scm.com/docs/protocol-v2
[1] https://lists.freedesktop.org/archives/libreoffice/2018-October/081249.html
