Is anyone using p2p ostree support?

Fri Feb 21 09:32:29 UTC 2020

On Thu, Feb 20, 2020 at 9:11 PM Matthew Leeds
<matthew.leeds at endlessm.com> wrote:
>
> On Mon, Feb 17, 2020 at 11:56 PM Alexander Larsson
> <alexander.larsson at gmail.com> wrote:
> >
> > I'm at a point where I'm starting to reconsider the support for the
> > p2p ostree setup in flatpak. We keep running into issues with it,
> > which means we haven't been able to enable it by default on flathub.
> > And it vastly complicates the code as well as the behaviour of flatpak
> > on the network, turning a simple natural workflow into something that
> > is super-hard to reason about because information comes from several,
> > partial and untrustworthy sources.
> >
>
> I certainly agree that the P2P (LAN/USB) support in Flatpak adds complexity and
> technical debt, and it would be great to find ways to reduce that. But I'm not
> aware of any current issues that prevent us from pushing it out to everyone on
> Flathub; what did you have in mind?

I don't actually know of any outstanding issue with p2p right now, but
many times we've been about to switch it on by default some issues
have arrived to block it. I also know that not many people use p2p
enabled and nobody does so with a non-trivial setup (i.e. actually
using mulitple peers), so I'm sure there are issues hiding in there.

However, if it were just for simple bugs I think they would just get
fixed over time. My problems with p2p is deeper than that. I can
explain to anyone how a regular ostree repository works and how
flatpak uses it, and it seems natural and understandable. However once
p2p gets added to the mix even I don't *really* understand it, I have
to constantly refer to the sources to see exactly what is going on.

In addition, I think the design of the ostree-metadata branch and how
it is used on the client side is quite bad. We have N potential peers
with different versions of this branch, which may or may not
correspond with the commits of the refs they actually carry. Somehow
we pick one version of this and pull it to a single local repo, that
can be updated at any time, even in the middle of an operation (since
flatpak instances can run in parallel).

I just find it impossible to reason about the behaviour of this.  In
comparison, the regular case we have simple one-file summary updates
that are atomic on both the server, locally on dis and in memory as
part of a transaction (i.e. you load one summary file and use that for
an entire operation).

Every time I add a feature in flatpak the regular version takes a
short time, but then I have to spend much longer time making the p2p
code work. Take the authentication callback stuff for instance, I had
to mangle the API in weird ways because of the complex p2p
handling. We had to emit the signals multiple times in unnatural ways
(whereas the regular case would have a natural emission point at the
begining of the op) and there are even some cases that the p2p setup
cannot handle.

I want to keep flatpak extensible and maintainable going forward, and
I think the current p2p approach is going to make that harder. So,
while p2p is not yet widely deployed we should take the time to
simplify it to what is actually used in practice in order to ensure
the longevity of the flatpak design.

So, the question them becomes "What is actually used in practice", and
reading your mail and other responses what I've understood this is two
things:

 * Sideloading apps/runtimes from a local drive, when the network is
   unavailable or slow. For example Endless preloading software or
   installing from usb stick or Fedora wanting to store flatpaks
   on the install media.

 * Having a local network server that mirrors part of a flatpak
   repository to avoid issues with slow or unreliable upstream network
   connections. This would be maintained by some kind of sysadmin,
   not just a random client machine.

The first case is actually in use, the second is not actually done
atm, but seems a reasonable usecase (in comparison with the blue sky
full peer-to-peer with multiple untrusted sources).

In the first case, we have a locally availible, partial repo, with
collection id and ostree-metadata branch, and we can assume that the
metadata branch is in sync for all the refs in it. Sideloading from
this seems like a much easier problem than the p2p case. We don't
have to pull any ostree-metadata branches into the local flatpak
repo, we can just access it as is.

Here is proposal for how this could be handled:

We have a remote "flatpak", it does not have a regular collection id
defined (so we use the normal pull codepaths etc for it). However, it
has a custom config option "xa.sideload-collection-id" which specifies
a collection id that all the commits in the remote repo ic created
with. Additionally we have a separate global configuration option
"sideload-paths" which has a list of pathnames, where each path
contains an ostree repo in p2p style. These repos are assumed to
be internally consistent (i.e. each repo has a ostree-metadata and
summary file where the commits matches in time), but not necessarily
consistent with each other or the upstream repo.

All the action now happens during the FlatpakTransaction
resolve_all_ops() call, where we figure out what exactly to download
and from where, given a set of ref+remote pairs.

In the non-p2p case what happens (for a particulare ref+remote pair) is:

We ensure the local copy of the remote summary is updated, then we load
it and use for the entire operation. The ref is looked up in the
summary to see what the current commit is, and if it is newer than
what we have locally we "resolve" it by getting the additional
metadata which is also in the summary file.

For the p2p case what happens is:

Create a temporary local ostree repo, child of the main flatpak
one. In parallel, for each possible source of information for the
collection-id that is configured for the remote, download the summary
file and look up the requested ref. From all the summaries that had
the ref, pick the one that is most recent, this is the commit we're
getting, but we also need the metadata. Now we do an ostree pull
operation with the COMMIT_ONLY flag from that peer, then load the
resulting commit object from disk get the metadata from that. Then
throw away the temporary repo, but we remember which actual source
we're pulling the ref from when doing the install.

The above is simplified because it only discusses a single ref/remote
pair. In practice the code is much more complex because it tries to
resolve all the refs at the same time, and even more so since we added
the token-callouts for authentication which happens in the middle of
this. But even with this simplification its obvious how much more
fiddly the p2p case is compared to the "read a single file and do some
metadata lookups" of the non-p2p case.

What I propose instead is:

1) Ensure the local copy of the summary is updated (if possible, i.e
   online) and load it into memory
2) For each configured local side-load repo, load its summary and
   ostree-metadata branch into memory
3) Look up the ref in all the summaries, disregarding any lookups from
   side-loaded summaries that doesn't match version with the
   corresponding ostree-metadata.
4) Choose which source to resolve from, based on which commit is
   latest (prefering sideloads to remote if the same), and taking into
   consideration whether we're online or not.
5) Get the metadata for that commit either from the summary (for upstream)
   or from the ostree-metadata (for sideloads).

This is slighly trickier than the original non-p2p case, but it is
still just loading a few local files and doing in-memory lookups from
them. Its nowhere near the complexity of the full p2p case.

Additionally, when using p2p we use the ostree-metadata instead of the
summary for operations that need a list of refs (like e.g. tab
completion), because the summary files from p2p peers are typically
partial. In an offline, initial sideload case we could end up in a
situation where we have no local copy of the complete summary
file. I propose to fix this we also let the sideload repos carry a
copy of the upstream complete (signed) summary file, and we use this
to update the local copy of the summary file in case we're
offline. This means we can completely avoid pulling any
ostree-metadata branches ever in the flatpak ostree repo.

This solves the side-load usecase. For the local mirror setup I think
a more natural solution is to avoid the complexities of collection-ids
and just set up a proxy-like server that mirrors the entire upstream
repo. It doesn't have to *actually* keep a copy of everything, instead
it would work like a CDN doing pulls from upstream when needed. In
case of spotty network this means that sometimes apps look like they
are available but fail during installation. However, if some
particular apps are needed (for e.g. a school) the sysadmin could
chose to manually pre-seed those. Then you just point your remote at
this (or even pick it up from the http proxy config).

This kind of mirroring setup is *much* easier to do in a stateful,
managed server like this than in the client library.