simple search api (was Re: mimetype standardisation by testsets)

Mon Nov 27 17:21:00 EET 2006

2006/11/27, Kevin Krammer <kevin.krammer at gmx.at>:
>
> On Monday 27 November 2006 12:08, Mikkel Kamstrup Erlandsen wrote:
>
> I am not a searching or indexing expert, merely wanted to input some
> information regarding D-Bus sync/async calls :)
>
> > I think you raise a really good question Kevin. Let me  first introduce
> > some  terminology to ease the communication.
> >
> > Page Query: All results for a given query is returned in one chunk. This
> > call is still *async* since it is over dbus. This is how it is
> sugegstedin
> > on the WasabiDraft wiki page.
> >
> > Async Query: Query results trickle in as the search engine picks them
> up.
> > Ie all query results are not returned in one batch.
>
> I'd rather call it "Full" and "Partial" Query or Query with "Full"
> or "Partial" delivery.

I was not trying to establish a convention, I just needed some words for
it.  For what it's worth I think Full- and Partial Delivery are the best
terms. However, for method names I actually think my names make more sense.

> In the page query the client can simulate an async query by requesting
> > several blocking queries with the same query string, but different
> > page-ranges. This gives a small problem with page ordering, but nothing
> > that the client app could not work around. The big benefit for page
> queries
> > is that server side sorting (score, relevance, date, whatever) is a
> > no-brainer for the client. Just append the "sort:<sorttype>" switch to
> the
> > query string.
>
> How long does a search service have to cache such a query - result
> combination?

That's up to the implementation.

Or is searching so fast, that the same query can be re-done on every call?

Again, some backends will have native caching capabilities, others won't. I
think we should focus on keeping the interface easy to use for application
developers, and leave the headaches to the search engine devs... Sorry guys
:-)

> In the async query you have a sorting problem. The client cannot sort the
> > hits, unless each returned URI also has metadata associated with it (it
> > looks this stuff up with another dbus call). I see a huge benefit in
> > allowing the results to trickle in (and allows for canceling queries as
> > Kevin points out). The async query is also much more suitable for live
> > queries (in the sense of updating the query when the on-disk files
> change -
> > or are deleted/created).
>
> Would it be possible to associate a sorting key with each match?
> If so it could be part of the returned data, i.e. the result being an
> array of
> tuples of URI and key.

I don't know if this would make sense actually... How would the backend know
what the final sort order would be if it hasn't collected all hits? - I'm
not ruling it out, I'm just not able to see how it would work out...

> So what do I think? I see 2 options:
> >
> > 1) Change the Query method name to PageQuery and add another AsyncQuery
> > with a signature and behavior we need to think a bit about.
> >
> > 2) Don't change the org.freedesktop.search.simple interface, but create
> > another interface generally aimed at live queries - or maybe include
> this
> > in the "advanced" search interface when we get to defining that.
>
> A more advanced interface could be based on query objects, i.e. the client
> requests a remote peer object for a specific query and the service creates
> an
> handler object and returns the object path.

Yeah, that could be an idea. This would not be a good idea for apps spawning
tons of searches though. And I actually think we should pay close attention
to catering for massive search requests. I can easily picture a future where
there are some client or other that does a bunch of searches in the
background showing relevant information to your current context... (just one
example).

The client can then call this object's methods and listen to this object's
> signals, without needing to reference it with the query string at each
> call
> or on each signal. The object path will be the reference

Again, I like the idea - but I see some problems with it though (as
mentioned above). Maybe it should rather be a server side client proxy or
something (that sounds like an oxymoron :-)). Where the remote object does
not represent a query, but rather a dedicated connection. I know that this
is possible with dbus, but I have never played around with it...

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20061127/85150761/attachment.htm