2007/6/23, Arun Raghavan <<a href="mailto:arunisgod@gmail.com">arunisgod@gmail.com</a>>:<br>> Hello,<br>> <br>> On 6/23/07, Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com
</a>> wrote:<br>> > 2007/6/22, Jos van den Oever <<a href="mailto:jvdoever@gmail.com">jvdoever@gmail.com</a>>:<br>> > > As you can see, the time stays about constant until the query becomes<br>> > > longer than 1000 characters. At 3000 characters we see 10% loss in
<br>> > > speed. 3000 characters of query is huge. Still only at about 20.000<br>> > > characters does the dbus performance halve. Using StartQuery() always<br>> > > halves the dbus performance!
<br>> > ><br>> > > Using the query as key is a bit slower for huge queries. It takes a<br>> > > bit more memory on the server, but in general it will be faster and<br>> > > most importantly will be simpler for the user.
<br>> > ><br>> > > It's unintuitive for us hackers to do this in such a simple way,<br>> > > because it feels like wasting resources. But in fact this is the most<br>> > > efficient solution.
<br>> <snip><br>> <br>> The memory impact will probably not be significant -- just one copy of<br>> the query. The server will probably just have a map of the (string<br>> searchId, SearchObject obj) (well, mine does at any rate), and in most
<br>> implementations the map will just use a hash of the string searchId<br>> key.<br>> <br>> However, as you say, using the query string does not "feel right". The<br>> cost of using StartSearch() might be double that of not, but from your
<br>> numbers it looks like we'll be moving from O(0.3 ms) to O(0.6 ms).<br>> Perhaps that might be an acceptable tradeoff?<br>> <br>> A not so great alternative might be to just use a hash of the query as
<br>> the searchId (potentially introducing a dependency on some library to<br>> provide a MD5/SHA1 implementation).<br>> <br>> <snip><br>> > Historical Note:<br>> > Using the query string as search handle was in fact one of the first
<br>> > proposals for the xesam search spec. I think we better dig out why it<br>> > was rejected then...<br>> <br>> Some digging turned up this --<br>> <a href="http://article.gmane.org/gmane.comp.freedesktop.xdg/8016">
http://article.gmane.org/gmane.comp.freedesktop.xdg/8016</a>. I dug a<br>> little further back too, but that looked too preliminary to cover<br>> this.<br><br>Thanks for the link. For the lazy among us let me quote Magnus Bergman:
<br><br><blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class="gmail_quote">I think it's a bad idea to use a query-string to identify a search for<br> the following reasons:
<br> * It is inefficient to sent a (possibly quite long) string for every<br> call.<br> * It isn't logical for the search engine to use the query string to<br> lookup the search because a query might generate a different result
<br> depending on then the search is started.<br> * An application might create different searches from the same query<br> (string) with different result ("all files created this minute").</blockquote><div>
<br>I 100% agree with Magus here, and I think these points demonstrate that we cannot use the query-string as search handle. Even (session,query_string) cannot be used as key based on these arguments. <br></div><br><br>Let me elaborate a bit on Magnus' point 1.
<br><br>We have to send the whole wuery string for each and every interaction with the search engine. These are NewSearch, CountHits, GetHits, GetHitData, CloseSearch. If you create a context-analyzer-daemon which constantly queries the search engine based on user behavior - possibly analyzing *the whole* hit set, the query string can be significant overhead.
<br><br>Cheers,<br>Mikkel<br>