[packagekit] Cache up-to-date-ness and bad user interation

Thu Oct 28 02:37:05 PDT 2010

At the moment we have the following mechanisms:

RefreshCache  ==  download all different cache files from the remote repo
GetUpdates  == get the list of updates from the packaging system, and
/sometimes/ get the latest list from the server.
SearchFile == search local or remote stores and maybe get new repodata
if the metadata doesn't match the filelist

I say sometimes, because yum has a built in "this is fresh enough"
value, and so doesn't get the repodata from the server if the "fresh
enough" value is less than the age of the updates file. The fudge
value is sub-optimal.

Of course, this has the effect of some update checks taking a few
hundred ms, and some update checks taking a few minutes. This blows
goats from a user interface point of view, as of course when we do the
scheduled update check in the background we want to get the latest
data from the repo, and when we do it in the GUI updater program we
want to return results as quick as possible. Ideally we would want to
specify the "this is fresh enough" value to the GetUpdates method.

Now of course, other methods are affected too. yum also downloads new
primary and filelists data if the user does a command not found type
action. Of course, we can give the backend hints about whether this is
interactive or not-interactive (using SetHints) but I'm not sure if
that's enough. All the backends seem to treat cache age slightly
differently, which makes it even harder to design a GUI that's
predictable.

So, 4 ideas:

1. All transactions do not refresh data from the network, and work
from a cache, no matter how old. If the cache is not available, then
they can download data. The only time new metadata is downloaded is
when the RefreshCache method is called, and this refreshes everything.
Advantages: Allows us to get rid of the different "Check for updates:
daily" and "Refresh package lists: weekly" UI entries.
Disadvantages: All the metadata is downloaded every day, which is
significant if you're on a mobile broadband connection, or a modem.

2. A new parameter to SetHints() which would be "cache-age=<time in
seconds>". This would allow the frontend to encode how fresh it wants
the data.
Advantages: Allows fine control of the age of the returned results
Disadvantages: Means the frontend probably has to query the updates
check value and pass it this value for GetUpdates, and choose
something sane otherwise. Most clients will set this to "intmax" which
will mean "never". Backends will have a complicated set of behaviour
where they have to juggle policy for cache-age, background and
interactive.

3. A new formal policy, which says "interactive=TRUE always has to
work from a cache where possible (unless specified for RefreshCache of
course)"
Advantages: allows GUI tools to get the results quickly, and
introduces no new complexity.
Disadvantages: We still have to encode a hardcoded value of cache-age
for the non-interactive case. The user has no way to get the latest
updates from the gui update viewer tool, until the next background
update is specified. This is probably okay, as the interaction
designers at Red Hat hate waiting for progress bars.

4. A combination of  "interactive=TRUE always has to work from a
cache" and adding "cache-age=<time in seconds>" for !interactive.
Advantages: Allows us to get the minimum amount of data, as we can use
the update check frequency for GetUpdates.
Disadvantages: lots of encoded policy which is getting pretty hard for
backends to understand.

At the moment I'm not sure which is my favourite, but this mail is
designed for people to tear apart my ideas, and suggest better ones.
I've got a feeling apt already works like idea 1.

Suggestions?

Richard.