Some flathub thoughts

Wed Sep 21 04:50:47 UTC 2016

So, I've done a bit more thinking about FlatHub and how it would work.
In particular, how the workflow would look for building apps.

So, flathub is fundamentally about flatpak/ostree repositories,
simarly to how github is about git repos. This means that the
operations you do in the web ui is mainly creating/naming/editig
flatpak repos. Each user can create however many repos he wants. Once
a repo is created the primary operations on it are building apps into
it, and installing apps from it. Exactly what goes into each repo is
completely up to the user to decide. He may have one repo per app, or
multiple apps in one repo, depending on his usecase.

Installing from a repo is pretty simple. Each repo is defined by a url
and a gpg key, and once a user adds the remote to his local
configuration everything should "just work" (i.e. flatpak can install
apps, and gnome-software can use the appstream data to create a nice
UI).

So, the remaining thing is to figure out how building should work. Each
build originates from a json manifest and a bunch of files (patches,
etc) that define the build of a particular app-id and branch. Building
starts with somehow getting a copy of these to flathub, and once it is
there we need to store it forever so builds can be reproduced.

Since we already have an ostree repo for the builds, I propose we
reuse this for storing the manifest+files. These files would be
commited to an ostree branch named something like:

  "source/org.my.App/stable" (i.e. source/$app/$branch)

Inside this branch the build system would look for file named
org.my.App.json. Flathub would verify that the json actually matches
the app-id/branch specified by the branch name, and fail otherwise.

In practice the upload is probably best done by having a small CLI
tool you run on you machine which creates a tarball (using "flatpak
--show-deps org.my.App.json" to figure out which files are needed),
which is then uploaded to flathub using some HTTP api.

Once we have the sources in ostree a build is triggered pointing to
this repo+commit. This would put the build on a queue waiting for a
free build machine. Once there is a free build machine it would pull
the sources from the ostree repo and run flatpak-builder. The status
of the queue would be visible on the webside, including build logs.
Once the build is finished and succeeded we generate a bundle of the
results and send these back to the main machine where it would be
signed and merged into the repo.

There are some complexities here that needs solving:

flatpak-builder needs to download all the sources, but if we
downloaded that source before we should reuse that. Also, if we do
download something new we should save that for the future (both to
avoid multiple downloads and for GPL reasons). This is not so hard for
tarballs, but its somewhat more complicated for git/bzr mirroring.
Partly because we do some complex git submodule handling, but also
because git mirrors change over time, so its we to store need extra
information to know what exact commit a build used. Maybe we need
some kind of support for this in flatpak-builder.

GPG key storage and signing is hard. To sign a build you either need
one
of:
 1) A cleartext gpg private key on flathub
 2) An encrypted gpg key and the passphrase on flathub
 3) Send the to-sign data to the client machine and sign it there.

Neither approach is very good, it's either very unsecure or painful to
use.  I'm not sure what the best approach is. Maybe we need to support
both. For instance I can imagine that a common setup is to have two
flatpak repos, one for test/devel builds and another for "release"
builds. You might want to use the "easy" approach for the devel repo
and the "painful" approach for the release repo.

Another question is how isolated we want the builds to be. Do we run
each build in a fresh VM, or do we rely on the sandboxing that
flatpak-builder uses? I added a "--sandbox" argument to
flatpak-builder which disallows you from specifying
break-out-of-sandbox arguments, but the container approach is always
less isolated than a full VM, but slower.

That was a bit rambling, but some food for though.