Sizes data in flatpaks

Mon Oct 12 14:11:06 UTC 2020

On Mon, 2020-10-12 at 06:55 -0600, Dan Nicholson wrote:
> On Mon, Oct 12, 2020 at 3:04 AM Alexander Larsson <alexl at redhat.com>
> wrote:
> > 
> 
> In both of these cases, the client would fetch the commit object list
> to do accurate progress reporting or fall back to the existing
> progress reporting if it doesn't exist. If it does exist, then it
> could do accurate progress, but you could also do 2 more clever
> things. You could queue everything for pulling immediately rather
> than
> the current scheme of fetching dirtree objects and scanning them to
> find more objects to pull. But it would also allow better decisions
> to
> be made in the non-scratch delta case by calculating the total size
> of
> each vs the number of fetches needed. I.e., if you have 20% of the
> objects the size of an object pull might be smaller but if it's going
> to take 1000 HTTP requests to get there instead of 10, it might take
> less time to get the delta at the expense of some wasted bandwidth.

What about this approach:

If instead of just having this list of the reachable objects from the
commit we make a new (optional) object type, the "mega dirtree" object.
This would contain all the dirtree objects reachable from a dirtree
(typically the root of the commit). In terms of size this would
probably be similar to the list of reachable object ids you've
experiemented with. But then we can immediately write all these objects
out and do a smarter pull operation.