[packagekit] Caching for APT backend (or how to get queries down to 0.5s)

Richard Hughes hughsient at gmail.com
Wed Sep 26 11:42:41 PDT 2007


On Wed, 2007-09-26 at 20:27 +0200, Tom Parker wrote:
> Haven't had much time to play with PackageKit recently (little things 
> like a PhD thesis to complete keep getting in my way)

Dude, joking aside, complete the thesis first and hack on PackageKit
second. In the grand scheme of things the thesis is 5000% more
important. But I do understand where you are coming from, doing my final
dissertation I needed some "geeky time" every now and then. Just be
careful...

>  but one of the 
> things I've now finally managed to get a chance to look at is the issues 
> of slow queries in apt. I'd love to eventually build a package manager 
> that does things like "find as you type", and if queries are taking too 
> long then this starts to be less usable.

Sure, I was thinking of doing this with pk-application not two weeks
ago. If the last query took < 1 second to complete, then do the query
live, else wait for the user to press enter. We can keep using Cancel()
if the user types quickly, or just trigger after 50ms. There's no code
in git yet, it's not even in the TODO. :-)

> Specifically, this dates back to the threading changes, and my 
> discovering that I can't really keep around a pointer to the apt 
> in-memory stuff, because that will screw with the dynamic module loading 
> stuff. The problem is that the libapt stuff is pretty slow to do its 
> standard setup, mainly 'cause it wants to do things like work out chunks 
> of dependancy stuff and read in lots (10s to 100s of MB) of text files 
> and manually parse them. This would be up there under things that newer 
> package management systems do better.

Sure, it looked pretty icky last time I looked at it too.

> I've been playing around with some stuff locally, and I've got 3 options:
> 1) Use libapt.  ~3-5 second delay before we even get to really start 
> doing queries, and really ugly C++ interfaces. OTOH, may have to use 
> this for installation if only for being certain about sanity issues, but 
> for queries this is too slow. Also, it stays slow even with a warm 
> file-cache. It *always* takes 3-5 seconds even if you've just made 
> another query. Hence why I wanted to cache this before, but I can't 
> figure out an easy way to cache this without actually keeping the 
> pointers around in memory.

Hmm. I really want to avoid having PK "locking" the apt database for
long periods of time. That would get us a bad reputation really, really
quickly.

> 2) Do my own limited parsing of the files, and avoid full db reading for 
> just simple searches. Attempts at this hit about 7-8 seconds/query, 
> which can probably be optimised, but the sheer quantity of plaintext to 
> be parsed limits what can be done here. Gets worse with a cold 
> file-cache (I've seen 10-15s).

Sure, that's acceptable, but certainly not ideal.

> 3) Do step 2 (with more complete parsing), but then dump the results 
> somewhere and do queries off of that. This is what I've got locally now. 
> The idea is that every time apt does a refresh-cache (or at least one 
> that loads new data) build an Sqlite db of the packages. Also do this if 
> the cache has been wiped or if someone else has updated the primary 
> system cache behind our back. We then do queries off of this.

Yes, this is a very good idea.

> Current numbers for queries with type 3 are down to about 0.5s (i.e. 
> fast enough for good "find as you type") with a warm file-cache, hitting 
> about 3-ish with a cold-cache. Rebuild times are in the 20-30s region 
> for my system (3Ghz P4, random single HDD, with *massive* collection of 
> sources). If this is folded into the refresh-cache task, and the user 
> doesn't refresh the cache outside of PackageKit too often, then the 
> amount of times this actually affects the user should be minimal.  Db on 
> my system is 27mb currently, and I'm for the moment storing it in the 
> same directory as the transactions db but as "apt.db". I went with 
> Sqlite mainly because of the existing PackageKit dependancy on it 
> (probably would have used it anyways)

Very sane. Feel free to add to the number of c or cpp files in the apt
backend directory if that's easier. As long as they all compile to
one .so object it's fine with me.

> Haven't pushed this yet, but can post the patch if any direct comments 
> on the code as opposed to comments on the idea come up.

The idea is very sound. Could you pls post the patch just for me to cast
my eye over please?

> Haven't 
> re-written all the apt methods for this yet (only SearchName). At the 
> moment this is all apt internal, but might be an idea to split some of 
> it out eventually in case any other backend writer wants to do similar 
> and doesn't want to have to duplicate the work too much.

Again, a very good idea. I want to try to avoid backend writers doing
this sort of thing, but where it has to be done, it has to be done.

> So, thoughts? Cracktastic points out of 10? Votes for me getting flamed 
> like crap by suggesting this sort of idea to the apt developers?

Err, I wouldn't suggest this to the apt guys. If you do, it was nice
knowing you :-)

As I said earlier, please post the patch; the idea sounds great.

Richard.





More information about the PackageKit mailing list