[TDF infra] Announcing Gitiles VCS browser (gitweb replacement) and https:// anon git URIs

Guilhem Moulin guilhem at libreoffice.org
Mon Oct 22 14:33:21 UTC 2018


On Mon, 22 Oct 2018 at 11:51:35 +0200, Lionel Elie Mamane wrote:
> On Wed, Oct 17, 2018 at 09:03:45PM +0200, Guilhem Moulin wrote:
>> On Wed, 17 Oct 2018 at 14:05:27 +0200, Eike Rathke wrote:
>>> On Wednesday, 2018-10-17 04:27:54 +0200, Guilhem Moulin wrote:
>>>> Lastly, it's now possible to clone and fetch git repositories over
>>>> https:// .  While git:// URLs will remain supported for the foreseeable
>>>> future, they're intentionally no longer advertised in gerrit, and we
>>>> encourage you to upgrade the scheme of your ‘remote.<name>.url’ to
>>>> secure transports (SSH for authenticated access, or HTTPS for anonymous
>>>> access).  We'll update ‘lode’ and chase remaining git:// URLs shortly.
> 
>>> Why is git:// deprecated? From what I know it's more efficient when
>>> fetching/pulling than https:// (or ssh://?)
> 
>> Since v1.6.6 it's no longer true [0], cf. git-http-backend(1) and
>> https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes
> 
> That webpage doesn't seem to contain a discussion of the efficiency of
> the various protocols.

My bad, I probably copy the URL from a wrong tab.  This is what I intended
to share: https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols .
As you can see the protocols are essentially equivalent.

For a high-level overview and pros and cons of each protocol, there is
also https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols ,
which reads

    “There is very little advantage that other protocols have over
    Smart HTTP for serving Git content.” :-)

To be fair, it also says that “The Git protocol is often the fastest
network transfer protocol available”, but that's just because no
encryption is always faster than the fastest encryption.  In practice
however, this argument is moot on modern CPUs.

FWIW, GitHub doesn't mentioned git:// URLs either (even though they're still
supported): https://help.github.com/articles/which-remote-url-should-i-use/ .
 
>> SSH is only used for transport, a git processed is exec()'ed on the
>> remote just like for git-daemon(1), so the only overhead is
>> crypto-related.  The handshake is a one-off thing, thus negligible
>> when you're transferring a large amount of data at once; (...) As
>> for symmetric crypto overhead, (...) the overhead should be
>> negligible.
> 
> All I know is that about 1/2/3 years ago ('t was I think in some
> coworking space in Brussels, probably a hackfest) I showed Michael
> Meeks how to have a separate "push" url (with ssh: protocol) and
> "pull" url (with git: protocol) and he was very happy at the
> speed-up.

Might be orthogonal to the git:// vs. https:// vs. ssh:// discussion.
Gerrit uses JGit as Git implementation, while git-daemon(1) spawns
“normal” (C-based) git-upload-pack(1) processes.  I recall Norbert and I
sat down during FOSDEM 2017 to solve perf issues with our JGit
deployment.  Perhaps you configured your ‘remote.<name>.pushurl’ at the
same time :-)

Anyway, it's easy enough to benchmark no-op `git fetch` on core.  master
is currently at c99732d59bc6, and I'm fetching from the same datacenter
to avoid metrics being polluted with network hiccups.

    $ git config remote.origin.url git://git.libreoffice.org/core && time git fetch
    0:01.62 (0.42 user, 0.64 sys)  142108k maxres
    ## Network usage: up 252kiB (4312 packets), down 10168kiB (7197 packets)

    $ git config remote.origin.url https://git.libreoffice.org/core && time git fetch
    0:01.63 (0.81 user, 0.29 sys)  141688k maxres
    ## Network usage: up 56kiB (924 packets), down 4194kiB (1241 packets)

    $ git config remote.origin.url "ssh://$USER@gerrit.libreoffice.org:29418/core" && time git fetch
    0:01.55 (0.62 user, 0.46 sys)  141588k maxres
    ## Network usage: up 67kiB (993 packets), down 9859kiB (1305 packets)

Pretty much equivalent, aren't they? :-)  (Network usage for https:// is
smaller because the TLS termination proxy is also compressing responses
from the git backend.  For git:// I guess the system time is higher than
the user time because it uses use sendfile(2) and friends since there
are no user-space encryption.)

As one might notice, network usage (~10MiB down, and growing) is really
high for a no-op `git fetch`.  That's caused by the >140k refs/changes/…
in the initial git-upload-pack advertisement(1):

    $ git ls-remote https://git.libreoffice.org/core | awk '
        $1 ~ /^[0-9a-f]{40}$/ {
            refs++;
            if ($2 ~ /^refs\/changes\//)
                changes++;
        }
        END {
            printf "refs=%d, changes=%d (%.1f%%)\n",
                refs, changes, 100*changes/refs;
        }
    '
    refs=144709, changes=142676 (98.6%)

All remote types are affected.  Since the number of changesets seems to
grow linearly [0], we should try to find a solution if we want the repository
to keep scaling.  I had an attempt at setting ‘uploadpack.hideRefs’ (and
‘uploadpack.allowTipSHA1InWant’) last Friday, to exclude refs/changes/…
from the initial advertisement, but that broke CI hence needs more work.
There is no urgency anyway (it's not a regression) and although it's
getting worse over time, by the time it's unbearable the Git protocol v2
[1] might save us :-)

-- 
Guilhem.

[0] https://dashboard.documentfoundation.org/app/kibana#/dashboard/Gerrit
[1] https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20181022/560b2618/attachment.sig>


More information about the LibreOffice mailing list