Simple search API proposal, take 2

Mikkel Kamstrup Erlandsen mikkel.kamstrup at
Thu Jan 11 13:32:12 PST 2007

2007/1/11, Joe Shaw <joeshaw at>:
> Hi,
> On Thu, 2007-01-11 at 11:48 +0100, Mikkel Kamstrup Erlandsen wrote:
> > Query (in s query_string, in as requested_properties, out s
> Is the idea here that the query will actually be run when Query is
> called, or is there just some server-side preparation for it?

Just to be clear - we're talking about the simple api, not the live one...

It can be both. If the server wants it can start the query right away, or it
can be "sloppy" where the actual query is only executed on the
GetHitProperties call. The only thing that is returned is a query_handle.

> If the query is actually run, that means the server has to keep all
> information about all the hits in memory until the query handle is
> somehow released.  (An API call which I think is missing.)

Hmmm, there could be an api call for releasing the query. An alternative
solution could be to have a session object like in the live interface.
Issuing a new query with the SimpleSession object invalidates the previous
query. The session would probably need a .close() method or something (like
the live api).

A sane timeout could also be settled upon. For this simple api where queries
invalidate quickly could possible allow for quite low timeouts. It would not
be possible to call GetHitProperties on a timed out handle.

> My Beagle index contains over 1 million documents, obviously this could
> quite easily get out of hand with a bunch of queries running.
> It not, and the query is only run on demand,

Well in the live api there is a Query.Start(), but I think the current
simple api doesn't need it.

> > CountHits (in s query_handle, out i count)
> >
> > GetHitProperties (in s query_handle, in i offset, in i limit, out
> > a{sa{sas}} response )
> Then these calls would be racy, because the index could change between
> calls.  I believe you want the query to run on demand, however, because
> of the limit passed in.

The simple api is racy in its very nature as far as I can see. Admittedly I
haven't given great thought to this, but I don't think there exists an
elegant non-racy solution (for the simple case).

> Could you define what the a{sa{sas}} map is?

Sure. It is a map from hit_identifiers to maps of property-value_list. Fx:


  hit_id_1 {
    "group" : ["email"]
    "message.title" : ["How are you these days?"]
    "uri" : ["email://blah/foobar"]
    "" : ["mommy at", "daddy at"]
    "keywords" : []

  hit_id_2 {
    "group" : ["email"]
    "message.title" : ["The indian says HOW!"]
    "uri" : ["email://blah/foobaz"]
    "" : []
    "keywords" : ["indians", "culture", "lasers"]


There could be included some properties for scoring/sorting purposes too.

> I think you probably also want an API where you just get a list of URIs;
> sometimes that's all you care about.  (This is a weakness in the Beagle
> API, and one which we're going to be fixing.)

Initially it was proposed that we used uris for hit_identifiers. We dropped
this idea in favor of opaque hit handles. Magnus suggested integers and I
suggest string.  I don't think uris are good as identifier if you want me to
elaborate then ask.

I realise that you are not necessarily suggesting to use uris as hit_ids.
While I think it is actually quite easy to get a list of uris with the
current proposal, I am more than willing to meet a demand for even easier
uri retrieval. Currently you have to do:

query_handle = Query (query_string)
uris = GetHitProperties (query_handle, 0, 100, ["uri"])

with my recent proposal it would be this instead:

query_handle = Query (query_string, ["uri"])
uris = GetHitProperties (query_handle, 0, 100)

can it get any simpler?

> > GetSnippets (in s query_handle, in as hit_handles)
> >
> > here hit_handles (in GetSnippets) is a list of keys from the a{sa{sas}}
> We would probably just use the URI as the key; the rename race is no
> worse than the removal race.

Well. That's true. However with uris you have both races. With ids you have
only removal race.

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the xdg mailing list