Sizes data in flatpaks

Dan Nicholson dbn at endlessos.org
Thu Oct 15 20:43:17 UTC 2020


On Mon, Oct 12, 2020 at 8:11 AM Alexander Larsson <alexl at redhat.com> wrote:
>
> On Mon, 2020-10-12 at 06:55 -0600, Dan Nicholson wrote:
> > On Mon, Oct 12, 2020 at 3:04 AM Alexander Larsson <alexl at redhat.com>
> > wrote:
> > >
> >
> > In both of these cases, the client would fetch the commit object list
> > to do accurate progress reporting or fall back to the existing
> > progress reporting if it doesn't exist. If it does exist, then it
> > could do accurate progress, but you could also do 2 more clever
> > things. You could queue everything for pulling immediately rather
> > than
> > the current scheme of fetching dirtree objects and scanning them to
> > find more objects to pull. But it would also allow better decisions
> > to
> > be made in the non-scratch delta case by calculating the total size
> > of
> > each vs the number of fetches needed. I.e., if you have 20% of the
> > objects the size of an object pull might be smaller but if it's going
> > to take 1000 HTTP requests to get there instead of 10, it might take
> > less time to get the delta at the expense of some wasted bandwidth.
>
> What about this approach:
>
> If instead of just having this list of the reachable objects from the
> commit we make a new (optional) object type, the "mega dirtree" object.
> This would contain all the dirtree objects reachable from a dirtree
> (typically the root of the commit). In terms of size this would
> probably be similar to the list of reachable object ids you've
> experiemented with. But then we can immediately write all these objects
> out and do a smarter pull operation.

Yeah, that's interesting. Since the dirtree objects contain the
checksums for the children as well as the names, it would be bigger
than the flat list of objects and wouldn't have deduplication of
objects in the listing. You could also put that in the commit object
and save yourself a roundtrip. It would be nice to get all the
checksums and paths at once instead of traversing the commit, though.

I did realize that my table the other day was wrong since I was using
an older ostree from before I fixed several bugs in the sizes
generation. Here's an updated version with current ostree. I added a
couple things this time. You were concerned about the size of the
commit object since flatpak sizes is already a concern with people, so
I wanted to see the size of the commit object relative to both the
download and install size of the objects. I also was curious how much
you could save by compressing the commit object over the network as
most HTTP servers offer. I used zlib level 1 as that's what nginx does
by default.

Ref                                              Objects  Download
Install    Current    With Sizes    Cur Comp    Sizes Comp
---------------------------------------------  ---------  ----------
---------  ---------  ------------  ----------  ------------
runtime/org.freedesktop.Platform/x86_64/19.08      12822  214.2 MiB
602.7 MiB  2.6 KiB    515.6 KiB     1.1 KiB     481.0 KiB
runtime/org.gnome.Platform/x86_64/3.38             21740  313.4 MiB
835.1 MiB  2.4 KiB    872.9 KiB     1.0 KiB     813.2 KiB
app/org.gimp.GIMP/x86_64/stable                    10406  109.4 MiB
313.7 MiB  1.5 KiB    419.5 KiB     933 bytes   392.6 KiB
app/org.mozilla.firefox/x86_64/stable                229  76.2 MiB
208.1 MiB  1.5 KiB    10.1 KiB      908 bytes   9.4 KiB
app/com.spotify.Client/x86_64/stable                1010  11.4 MiB
32.0 MiB   1.8 KiB    40.0 KiB      1.1 KiB     39.1 KiB

One of the bugs I had fixed was that the sizes entries were being
reused between commits since they're stored in the repo struct. It
still makes the objects quite a bit bigger but only the GNOME platform
is approaching 1 MB, which is a fraction of the total size. The
compression helps a bit but not much.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: add-commit-sizes
Type: application/octet-stream
Size: 7719 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/flatpak/attachments/20201015/65cf1d55/attachment.obj>


More information about the Flatpak mailing list