simple search api (was Re: mimetype standardisation by testsets)

Mon Nov 27 13:08:20 EET 2006

2006/11/26, Kevin Krammer <kevin.krammer at gmx.at>:
>
> On Sunday 26 November 2006 18:17, Jean-Francois Dockes wrote:
> > Jos van den Oever writes:
> >  > 2006/11/26, Jean-Francois Dockes <jean-francois.dockes at wanadoo.fr>:
> >  > >  1- A need for trivial enabling of text search in any (non-search)
> >  > >     application, with minimal fuss, (better described by Fabrice in
> >  > > the quoted message).
> >  >
> >  > For this, we also need a way to search in documents that have not
> been
> >  > indexed. Indexes can take up a lot of space and the user might not
> >  > want to have an index of all her data, but still want to search that
> >  > data now and then.
> >  > Since searching in this way is a lot slower, there would need to be a
> >  > more asynchroneous method of reporting the search results.
> >
> > I'm not a d-bus expert, but at least with the qt4 bindings, it seems
> that
> > you have a choice of waiting for the reply to a d-bus message, or be
> called
> > later when it arrives. There doesn't seem to be anything inherently
> > synchronous in d-bus, so I would imagine that other bindings or adaptors
> > have similar capabilities.
>
> Technically correct, this is a feature of the low-level D-Bus library.
>
> However this is a different use case.
>
> The asynchronous D-Bus call is for getting _the_ result later.
>
> The use case discussed here is slightly different (unless I am
> misunderstanding) it is about returning _some_ results later.
>
> Example: a user searches through a lot of emails. The program should be
> able
> to display results as soon as possible. At this point the results do not
> need
> to be complete, matches can trickle in when found.
>
> An asynchronous call would still have to wait for all results, i.e. a
> completed query. The user would have to wait for the slowest match.
>
> An option would be to have the initial query call return a query
> identifier
> instead of results and results would be transported by D-Bus signals using
> this identifier as a reference.
>
> A bonus would be to have the possibility of cancelling a query using this
> identifier. The user might already have found what they were looking for
> and
> cancel the search operation in their program. An ongoing searching
> operation
> would not be a problem for the program (it can just ignore any further
> results), but it could be hard to explain to a user why their harddisk
> keeps
> accessing files like mad.

I think you raise a really good question Kevin. Let me  first introduce
some  terminology to ease the communication.

Page Query: All results for a given query is returned in one chunk. This
call is still *async* since it is over dbus. This is how it is sugegstedin
on the WasabiDraft wiki page.

Async Query: Query results trickle in as the search engine picks them up. Ie
all query results are not returned in one batch.

In the page query the client can simulate an async query by requesting
several blocking queries with the same query string, but different
page-ranges. This gives a small problem with page ordering, but nothing that
the client app could not work around. The big benefit for page queries is
that server side sorting (score, relevance, date, whatever) is a no-brainer
for the client. Just append the "sort:<sorttype>" switch to the query
string.

In the async query you have a sorting problem. The client cannot sort the
hits, unless each returned URI also has metadata associated with it (it
looks this stuff up with another dbus call). I see a huge benefit in
allowing the results to trickle in (and allows for canceling queries as
Kevin points out). The async query is also much more suitable for live
queries (in the sense of updating the query when the on-disk files change -
or are deleted/created).

So what do I think? I see 2 options:

1) Change the Query method name to PageQuery and add another AsyncQuery with
a signature and behavior we need to think a bit about.

2) Don't change the org.freedesktop.search.simple interface, but create
another interface generally aimed at live queries - or maybe include this in
the "advanced" search interface when we get to defining that.

I don't really know what I like/dislike the most... Any other ideas, or
comments, will be highly appreciated :-)

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20061127/f62ba87f/attachment.htm