[TDF infra] Announcing Gitiles VCS browser (gitweb replacement) and https:// anon git URIs

Guilhem Moulin guilhem at libreoffice.org
Tue Oct 23 05:34:54 UTC 2018


On Mon, 22 Oct 2018 at 17:25:11 +0200, Lionel Elie Mamane wrote:
> On Mon, Oct 22, 2018 at 04:33:21PM +0200, Guilhem Moulin wrote:
>> On Mon, 22 Oct 2018 at 11:51:35 +0200, Lionel Elie Mamane wrote:
>>> On Wed, Oct 17, 2018 at 09:03:45PM +0200, Guilhem Moulin wrote:
>>>> SSH is only used for transport, a git processed is exec()'ed on the
>>>> remote just like for git-daemon(1), so the only overhead is
>>>> crypto-related.  The handshake is a one-off thing, thus negligible
>>>> when you're transferring a large amount of data at once; (...) As
>>>> for symmetric crypto overhead, (...) the overhead should be
>>>> negligible.
>
>>> All I know is that about 1/2/3 years ago ('t was I think in some
>>> coworking space in Brussels, probably a hackfest) I showed Michael
>>> Meeks how to have a separate "push" url (with ssh: protocol) and
>>> "pull" url (with git: protocol) and he was very happy at the
>>> speed-up.
> 
>> Might be orthogonal to the git:// vs. https:// vs. ssh://
>> discussion.  Gerrit uses JGit as Git implementation, while
>> git-daemon(1) spawns “normal” (C-based) git-upload-pack(1)
>> processes.
> 
> For us developers of LibreOffice, and thus consumers of the Gerrit /
> Git service of freedesktop.org and TDF, whether the difference comes
> from the protocol itself or a different git implementation on the
> server to serve the different protocols is intellectually interesting
> (thanks for that!), but materially inconsequential: if using git: will
> be faster, we will use git:.

Following the same logic, you want gerrit.libreoffice.org to serve
content over plain http:// so you can save the two round-trips when you
launch your browser to submit your reviews? Oo

FWIW, we're stuck with git:// for the years to come because there is no
smooth upgrade path for clients; if I were to deploy the service today I
would most likely skip git-daemon(1).  Things have changed since 2012,
encryption is faster (there are modern stream ciphers and hardware
acceleration is more widespread), and for situations like this one there
is no reason not to encrypt data in transit.
 
>> I recall Norbert and I sat down during FOSDEM 2017 to solve perf
>> issues with our JGit deployment.  Perhaps you configured your
>> ‘remote.<name>.pushurl’ at the same time :-)
> 
> I can easily believe it was earlier.

Then it was before my time, so no idea what the bottleneck was.

>> Anyway, it's easy enough to benchmark no-op `git fetch` on core.  master
>> is currently at c99732d59bc6, and I'm fetching from the same datacenter
>> to avoid metrics being polluted with network hiccups.
> 
> Yes, but no. You also test in an environment where a network RTT is
> probably about one fifth to one third of a millisecond, and bandwidth
> at least 100Mbps if not 1000Mbps? In that case, everything will be
> fast. Time difference will be lost in noise.

I was arguing that C git and Jgit's performances are indistinguishable
on the current instance.  Transport overhead is the normal batch-mode
SSH (resp. TLS) overhead for ssh:// (resp. https://) remotes.

As mentioned earlier the protocol is essentially the same for git:// and
http:// (on servers supporting smart HTTP).  In both cases there is a
first round-trip (client hello + server git-upload-pack advertisement),
and another if the client is missing some object(s) (client sends list
of missing objects and receives a pack back).  For http:// these are
done in two sequential requests to the same connection (resp. ‘GET
/$REPO/info/refs?service=git-upload-pack’ and ‘POST
/$REPO/git-upload-pack’); for git:// there are equivalent requests in
the Git wire protocol.

https:// is just http:// wrapped in TLS, which costs an extra 2 round-trips
(TLS 1.3 brings that down to a single extra round-trip, but we don't offer
it yet).

For ssh://, what happens under the hood (as witnessed when adding
“GIT_TRACE=1” to the environment) is that an ssh process is spawned to
run git-upload-pack on the remote machine:

    ssh -p 29418 gerrit.libreoffice.org git-upload-pack "/core"

Counting round-trips for SSH isn't trivial, but let me try:
  * Client & server greet each other (in parallel)
  * Client & server initiate KEX (in parallel)
  * Key EXchange
  * Client & server send NEWKEYS (in parallel)
  * Client requests service, wait for response
  * Client send each pubkey one at a time, waits for response; for the
    one that's accepted by the server, it sends the signed response and
    waits for the server to ack
  * Client asks to open a new channel, waits for response
  * Client asks to execute command in said channel, wait for response
  * […]
  * Server sends EOF and closes channel
  * Client acks, closes channels, and disconnects

So if the latency is symmetric and the first key offered is accepted by
the server, that makes a constant overhead of 8.5 round-trips.  (When
using an existing — multiplexed — connection the overhead becomes 2.5
round-trips.)  Additionally, the sending side must wait for the client
to adjust the window size when it's full. (OpenSSH's window size is 2MiB
at compile-time and is adjusted on the fly depending on network
conditions, cf. RFC 4254 sec. 5.2 for details.)

> Are these protocols (or the *implementations* of these protocols) more
> sensitive to RTT than another? They do more roundtrips? Or not?

Given the current (and growing) size of the git-upload-pack advertisement,
I doubt latency will be the bottleneck here.  Not until we manage to
shrink it.

FWIW there is another advantage of using HTTP as pull URL, namely that
capable clients can send HTTP headers such “Accept-Encoding: deflate,
gzip” (on Debian Jessie and later it's compiled in, not sure if that's
an upstream default).  That way the backend can compress responses it
thinks are worth compressing.  As shown in my earlier message, this
halves the size of git-upload-pack advertisement, despite the fact that
it contains 145k random 40-hexdigits long strings.  AFAIK compression of
data in transit isn't in the git protocol, hence not available for
ssh:// and git:// URLs.  (For SSH one doesn't want transport-level
compression, as packs are already compressed by the git protocol.)
Saving 5MiB per fetch is certainly interesting in low-bandwidth
networks.

Also, TCP port 443 is less likely to be blocked than 9418 :-)

-- 
Guilhem.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20181023/ccf20dcd/attachment.sig>


More information about the LibreOffice mailing list