simple search api (was Re: mimetype standardisation by testsets)

Thu Nov 30 00:11:38 EET 2006

2006/11/29, Jamie McCracken <jamiemcc at blueyonder.co.uk>:
> Jos van den Oever wrote:
> > DBus activation does not solve the problem of finding the right search
> > engine for a particular query. The vision of having many different
> > search providers, where the disk indexers are the most important ones,
> > is one that requires this.
>
> not sure - I just presume that only one indexing service will ever be
> used by most users. I dont htink there is a requirement for running
> multiple indexers (but feel free to make tracker another backend for
> strigii!)
This presumption could become false easily. Just look at the services
Gnome Deskbar implements. The search interface could be about more
than just indexed documents. E.g. there might be an officewide search,
a systemwide search and a search for personal documents. These use
quite different technologies and it's not likely that the desktop
search engines in their current form will combine all those. So having
a common api over which other apps could merge these searches would be
useful.

> >> Also I agree with most of what Joe Shaw has said on this thread
> >>
> >> We should punt query language and metadata names to a later spec and
> >> concentrate on getting a very simple implementation going first.
> > True.
> >
> >> As for live queries, I dont like dynamic interfaces and in tracker we
> >> will simply take a live_query_id as a param and use dbus signal
> >> filtering to listen for changes to that specific ID
> > What do you mean with 'dynamic interfaces'? We are not talking about
> > interfaces that change, but about busses that will have a dynamic
> > number of objects.
> >
> >> I would suggest having a PrepareQuery method which returns a unique
> >> integer for the query then you can use that to :
> >>
> >> 1) Execute the query
> >> 2) Get the hit count
> >> 3) listen (using dbus match rules) for specific changes (hit
> >> added/removed)
> >>
> >> Its simple and avoids bad practices like dynamically chanigning
> >> interfaces (and is no less efficient)
> >>
> >> Heres my suggested spec for Wasabi:
> >>
> >> ServiceTypes is an array of service names like "Files", "Emails" etc
> >> need to define full list)
> >>
> >>
> >> method PrepareQuery (ServiceTypes as, query s) return ID i
> >>
> >> method ExecuteQuery (int ID, offset i, limit i) return as (array of
> >> uri's)
> >>
> >> method QueryHitCount (int ID) return i
> >
> > I see no value in a PrepareQuery method. If the search engine has a
> > need for caching, it can use the query as an ID.
>
> in tracker we have many types of query like "Get all files of a certain
> mime types" which only use sqlite and not our indexer and there would
> also be no search term as such too. Likewise with RDF query so we cant
> really use the query itself as the unique ID.
>
> We store live query results in a temporary sqlite DB so we need a unique
> (integer) ID for those.
So store the query along with the id and match for that.

> Of course I could code around this but it would be more hacky for us to
> do it that way
What you describe is an implementation detail. Of course you can
optimize the handling of the query by analyzing it. But you shouldn't
let the user determine which mechanism to use and have a different API
for different optimization scenarios.

> (The above api is the one we will shortly be using in tracker to support
> live query but of course it may not be suitable for others - thats the
> hard part of standardising!)
Exactly, so unless there are real performance or design issues that
are near insurmountable, we should try to stick to the common ground
we found. The interface I proposed initially is also quite different
from what Strigi currently has, but the nice thing about rethinking
all this for the sake of standardization is that we can start afresh.

> The functions
> > ExecuteQuery and QueryHitCount are contradictory. The first one is
> > asynchroneous and the second one synchroneous.
>
> not really they can both be async or sync
What i meant is that the count query has to go through the entire list
of results and count them and the other query function can start
returning results earlier and sends them in packets.

> ExecuteQuery returns "limit" hits starting from offset - signals are not
> used here to get results
>
> >
> >> signal QueryHitAdd (ID i, uri s)
> >> signal QueryHitRemove (ID i, uri s)
> > For performance reasons, these queries should return multiple URIs at once.
>
> they do - these signals are only emitted in response to file
> notifications like a file is deleted or added. Ergo this makes them "live".
It's better for performance to put multiple URIs in one signal.

> No way would we (in tracker) use them for fetching stuff incrementally
> (I may be wrong but I think only beagle currently does that?).
Eh, it looks incremental to me.
As I see it, there are two types of query: a query that returns all
the results in one go (possibly with offset and maximum) and a query
that returns the results in packets.

> >
> >> The above is easy to implement and should cover the simple ground
> > True, but I dont think it is radically different from the interface
> > that has been discussed before. Please have a look at what's there now
> > and help to build on that.
> >
> >> For nautilus also needed is extra methods for searching files by mime
> >> types and/or location - these can be separate methods as tracker
> >> implements them (mime and location dont have a meaning with non-file
> >> entities)
> >
> > They could be separate methods, but the requirements are also
> > adequately covered by the current proposal.
>
> I find the current proposal (the spec on freedesktop) a bit too much to
> start with.
>
> Lets concentrate on meeting the requirements of existing apps rather
> than trying to create an open ended system which will be more difficult
> to fashion agreement (at this stage).
>
> If third party apps start using the freedesktop spec then we have a
> platform to build more requirements on and the impetus for supporting
> them becomes greater.
>
>
> --
> Mr Jamie McCracken
> http://jamiemcc.livejournal.com/
>
>