[XESAM] Spec update proposals

Sat Jun 23 04:18:43 PDT 2007

2007/6/23, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> > However, as you say, using the query string does not "feel right". The
> > cost of using StartSearch() might be double that of not, but from your
> > numbers it looks like we'll be moving from O(0.3 ms) to O(0.6 ms).
> > Perhaps that might be an acceptable tradeoff?
> >
> > A not so great alternative might be to just use a hash of the query as
> > the searchId (potentially introducing a dependency on some library to
> > provide a MD5/SHA1 implementation).
> >
> > <snip>
> > > Historical Note:
> > > Using the query string as search handle was in fact one of the first
> > > proposals for the xesam search spec. I think we better dig out why it
> > > was rejected then...
> >
> > Some digging turned up this --
> >
> http://article.gmane.org/gmane.comp.freedesktop.xdg/8016.
> I dug a
> > little further back too, but that looked too preliminary to cover
> > this.
>
> Thanks for the link. For the lazy among us let me quote Magnus Bergman:
>
> > I think it's a bad idea to use a query-string to identify a search for
> >   the following reasons:
> >   * It is inefficient to sent a (possibly quite long) string for every
> >     call.
> >   * It isn't logical for the search engine to use the query string to
> >     lookup the search because a query might generate a different result
> >     depending on then the search is started.
> >   * An application might create different searches from the same query
> >     (string) with different result ("all files created this minute").
>
> I 100% agree with Magus here, and I think these points demonstrate that we
> cannot use the query-string as search handle. Even (session,query_string)
> cannot be used as key based on these arguments.
>
>
> Let me elaborate a bit on Magnus' point 1.
>
> We have to send the whole wuery string for each and every interaction with
> the search engine. These are NewSearch, CountHits, GetHits, GetHitData,
> CloseSearch. If you create a context-analyzer-daemon which constantly
> queries the search engine based on user behavior - possibly analyzing *the
> whole* hit set, the query string can be significant overhead.
For this case, parsing the query might be the larger overhead. We
cannot know this without measuring it. Although, because we sent the
query for every message, this does add up.

For the sake of clarity, I hereby retract the idea of using the query
as the key and support the the StartSearch() call.

Cheers,
Jos