[TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits

Sat May 4 10:05:27 UTC 2024

Hi Eyal, all,

this came up on board-discuss [1], but I believe the best place to
have this discussion, is on the LibreOffice developer list. Let's
please follow-up here.

A few comments in-line -

Eyal Rozenberg wrote:
> Heiko wrote:
> > How to connect two or more individuals? It requires routing
> >
> I would opt for a simple protocol which does not take on problems
> more complex than it has to, at least for a preliminary
> implementation. Specifically - the Internet routes for us.
>
Connecting people for peer2peer communication is conventionally called
discovery.

There's ample writeups on the topic, here's two pointers:

* quite an accessible summary: https://jsantell.com/p2p-peer-discovery/
* scientific survey paper: https://www.inets.rwth-aachen.de/wp-content/uploads/2022/07/service_discovery_survey.pdf

> Heiko wrote:
> > given that users not want to fiddle around with ports and
> > firewalls neither to share IP addresses I presume this requires a
> > server.
> But then it would not really be P2P, would it?
> 
> Anyway - a server would not necessarily be required. That is, P2P
> connection will happen when two users want to connect; but in our
> case, they have already "connected" in some other way to agree on
> making the P2P connection. So I suggest, in light of my previous
> point, that we assume that the two (or more) users have another,
> independent means of communication over which they can send some
> data for bootstrapping the LibreOffice P2P. And this could be made
> easy, UI-wise, so that the user just needs to press a copy button,
> and paste some string so that the other user can see it. The other
> user copies the string and pastes it into an appropriate area in
> their own running LO instance. Then the connection is set up.
> 
So in a word, piggy-backing on another, existing communication
channel?

> What kind of data? Basically, I assume that would be a tuple of (IP,
> port number, public key). I will admit that this doesn't cater to
> the case of two firewalled users; that's a situation I'm not
> experienced enough in handling, but I do know there are [many
> approaches](https://en.wikipedia.org/wiki/NAT_traversal) (Wikipedia)
> to handling it. Some may require a third-party "switching server",
> some may not. But such a server can probably be very minimal and
> hopefully not even aware of what protocol it's being used to allow
> connections for.
>
Or taking the idea one step further: re-using the other, existing
comms channel, also for all of the collaboration traffic!

> Heiko wrote:
> > What could be achievable on TDF infrastructure?
> >
> Given what I've said above - let's try to make this completely
> independent of TDF infrastructure. Either with no switching-server
> at all or with something minimal that hopefully might not even need
> TDF continuously maintaining a server. Note that maintenance by us
> also has privacy implications, much more so than third-party-less
> P2P.
>
Yup. At any rate, requiring any kind of centralized server
infrastructure has inevitable scalability challenges. It would still
be useful if TDF could help with bootstrapping whatever server
infrastructure will be needed, though.

> Heiko wrote:
> > Isn’t it better to share UNO commands and parameters?
> >
> Mmm... maybe... but - what about showing the other party's cursor
> and mouse movements? You can't do that with UNO commands.
>
Starting off from Collabora Online - which is a production-ready
implementation of LibreOffice collaboration, that uses both low-level
key & mouse events, as well as UNO commands - I guess the answer is
'both'? ;)

> Heiko wrote:
> > How do we solve the situation when one participant enters text and
> > another deletes the same paragraph?
>
> It doesn't have to be a great solution, as long as it is
> consistent. i.e. if users know that two people on a laggy connection
> editing the same sentence is likely to get them making changes in
> wrong positions etc., they will naturally limit the extent to which
> they do this - like we know from Etherpad. Consistency of behavior
> and "principle of least astonishment" would be more important than
> perfect coordination/synchronization of inputs.
>
With a dedicated server, you don't even have that problem. All input
will get serialized through this instance, so there's a strict
temporal ordering for all edits. Whatever package reaches the server
first, will 'win' in an edit war. A fully distributed solution (which
is way harder to implement!) has no such strict global ordering per
se, but there's algorithms such as CRDTs[2], which guarantee eventual
consistency in all peers. But you're right, the Etherpad experience
shows that under bad network connectivity, user experience will start
to suffer. For example, all CRDTs I've looked at would always have a
'delete' operation win over other edits, on the same span of text.

> Heiko wrote:
> > Encryption and data integrity is key.
> >
> Perhaps TLS if it's a TCP-based protocol, and DTLS if it's UDP?
> Using the exchanged public keys I mentioned before?
>
Evidently. Or further re-use of existing p2p/chat solutions.

Of course, integrity is somewhat relative here - if you invite the
world to co-edit your document. ;)

> PS - I've mostly focused on the case of two users. More users makes
> things trickier.
>
On that, I believe it's a mistake to discuss solutions that only
really work for two users - since it will lead to a lot of
re-architecting down the road.

[1] https://community.documentfoundation.org/t/libreoffice-peer2peer-collaboration-bits/11900/6
[2] https://en.wikipedia.org/wiki/CRDT

Cheers,

-- Thorsten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 427 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20240504/94e0cd06/attachment.sig>