[XESAM] Spec update proposals

Sat Jun 23 04:09:09 PDT 2007

2007/6/23, Arun Raghavan <arunisgod at gmail.com>:
> Hello,
>
> On 6/23/07, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com> wrote:
> > 2007/6/22, Jos van den Oever <jvdoever at gmail.com>:
> > > As you can see, the time stays about constant until the query becomes
> > > longer than 1000 characters. At 3000 characters we see 10% loss in
> > > speed. 3000 characters of query is huge. Still only at about 20.000
> > > characters does the dbus performance halve. Using StartQuery() always
> > > halves the dbus performance!
> > >
> > > Using the query as key is a bit slower for huge queries. It takes a
> > > bit more memory on the server, but in general it will be faster and
> > > most importantly will be simpler for the user.
> > >
> > > It's unintuitive for us hackers to do this in such a simple way,
> > > because it feels like wasting resources. But in fact this is the most
> > > efficient solution.
> <snip>
>
> The memory impact will probably not be significant -- just one copy of
> the query. The server will probably just have a map of the (string
> searchId, SearchObject obj) (well, mine does at any rate), and in most
> implementations the map will just use a hash of the string searchId
> key.
>
> However, as you say, using the query string does not "feel right". The
> cost of using StartSearch() might be double that of not, but from your
> numbers it looks like we'll be moving from O(0.3 ms) to O(0.6 ms).
> Perhaps that might be an acceptable tradeoff?
>
> A not so great alternative might be to just use a hash of the query as
> the searchId (potentially introducing a dependency on some library to
> provide a MD5/SHA1 implementation).
>
> <snip>
> > Historical Note:
> > Using the query string as search handle was in fact one of the first
> > proposals for the xesam search spec. I think we better dig out why it
> > was rejected then...
>
> Some digging turned up this --
> http://article.gmane.org/gmane.comp.freedesktop.xdg/8016. I dug a
> little further back too, but that looked too preliminary to cover
> this.

Thanks for the link. For the lazy among us let me quote Magnus Bergman:

I think it's a bad idea to use a query-string to identify a search for
>   the following reasons:
>   * It is inefficient to sent a (possibly quite long) string for every
>     call.
>   * It isn't logical for the search engine to use the query string to
>     lookup the search because a query might generate a different result
>     depending on then the search is started.
>   * An application might create different searches from the same query
>     (string) with different result ("all files created this minute").

I 100% agree with Magus here, and I think these points demonstrate that we
cannot use the query-string as search handle. Even (session,query_string)
cannot be used as key based on these arguments.

Let me elaborate a bit on Magnus' point 1.

We have to send the whole wuery string for each and every interaction with
the search engine. These are NewSearch, CountHits, GetHits, GetHitData,
CloseSearch. If you create a context-analyzer-daemon which constantly
queries the search engine based on user behavior - possibly analyzing *the
whole* hit set, the query string can be significant overhead.

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070623/8a2e6f65/attachment.htm