2007/7/19, Jos van den Oever <<a href="mailto:jvdoever@gmail.com">jvdoever@gmail.com</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
2007/7/16, Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>>:<br>> I have a few suggestions for updates to the xesam search spec.<br>><br>> * API:<br>> Remove the session properties
search.blocking and search.live. These seemed<br>> to cause more confusion than I anticipated. These can be emulated in the<br>> client side lib as far as my scribblings can tell. Anoter solution might<br>> just be better documentation of course...
<br>><br>> Some of you now have actual experience with these, what is your feel?<br>><br>> The reason for having these properties in the first place was to allow<br>> easier usage of the dbus interface directly - ie not via a client lib.
<br>><br>> What this would mean for the api methods:<br>> * GetHits should always block until the requested number of hits has been<br>> found or the entire index has been searched (in which case the SearchDone
<br>> signal will be emitted too).<br>> * CountHits should always block until the entire index has been searched<br>> * No other methods should block<br>><br>> * Query Language:<br>> I suggest we remove the "type" attributeon the query element. You can just
<br>> specify the Category- or StoredAs fields in you selectors.<br><br>I completely agree on all suggestions.<br>One more suggestion: the minimal interval between result signals<br>should be sane or settable.</blockquote>
<div><br>Valid point. To avoid signal spamming I take it. How about a session property hit.batch.size that is an integer determining how many hits the server should collect before emitting HitsAdded. In case the entire index has been searched but <
hit.batch.size hits has been found HitsAdded should be emitted(num_hits) right before SearchDone.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On the topic of remembering the hits.<br>In ideal world, the server could be clever and get the right file from<br>the hit number. In reality, this is quite hard. Atm the server should<br>keep a vector with uris internally. I think we should allow the server
<br>to have a sane maximum of hits that are retrievable. E.b. CountHits<br>might return 1 million, but you would only be able to retrieve the<br>first 100k.</blockquote><div><br>This makes sense given that the scoring algorithms on servers are good enough. But judging by the extraordinary amount of talent we have in the server-side dev camp this is no problem of course :-)
<br><br>How about a read-only session property search.maxhits? We could specify that in order to be xesam compliant this value must be > 1000 or something - just so that apps wont have to sanity checks galore.<br></div>
<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">This is actually a scalability issue. We should allow the search to<br>modify the vector when the hit has not yet been retrieved and only
<br>guarantee reproducibility for hits that were retrieved already. In<br>combination with a maximum history size this would handle most<br>performance problems.</blockquote><div><br>Yeah, we are handling the exact same problems at work :-) I think we have solved it here (atleast up to 100M or so), but it is not exactly client side software...
<br></div><br>Cheers,<br>Mikkel<br></div><br>