2007/7/20, Jos van den Oever <<a href="mailto:jvdoever@gmail.com">jvdoever@gmail.com</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
2007/7/20, Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>>:<br>> > I completely agree on all suggestions.<br>> > One more suggestion: the minimal interval between result signals
<br>> > should be sane or settable.<br>><br>> Valid point. To avoid signal spamming I take it. How about a session<br>> property hit.batch.size that is an integer determining how many hits the<br>> server should collect before emitting HitsAdded. In case the entire index
<br>> has been searched but < hit.batch.size hits has been found HitsAdded should<br>> be emitted(num_hits) right before SearchDone.<br><br>I would prefer setting this in terms of milliseconds, not number of<br>hits. Imagine you have the batch size at 100 and hits 1-99 are there
<br>in 1 ms and hit #100 takes 20 seconds. That would not be so nice. If<br>you say that the time between signals must be at least 100 ms, you<br>solve the problem more elegantly.</blockquote><div><br>How about not setting it at all and just let the server side implementation decide the best strategy? I guess the value of such property (whether in millis or hit count) is highly implementation dependent.
<br><br>You bring up the slowness problem, but there is also the flooding problem fx searches with 1.000.000 million hits...<br><br>I think only the server has a reasonable chance of guessing the right strategy. The client is basically in the dark here. I have reversed my opinion - I say "keep the logic server side".
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> > On the topic of remembering the hits.<br>> > In ideal world, the server could be clever and get the right file from
<br>> > the hit number. In reality, this is quite hard. Atm the server should<br>> > keep a vector with uris internally. I think we should allow the server<br>> > to have a sane maximum of hits that are retrievable.
E.b. CountHits<br>> > might return 1 million, but you would only be able to retrieve the<br>> > first 100k.<br>><br>> This makes sense given that the scoring algorithms on servers are good<br>> enough. But judging by the extraordinary amount of talent we have in the
<br>> server-side dev camp this is no problem of course :-)<br>The problem is not in the scoring algos, but in the changing data on<br>disk. If you do not get the list of uris at once, it may change due to<br>changes on the disk. I say we should ignore this problem as long as
<br>the uri has not yet been requested and say that the result list is not<br>fixed until it is actually requested.</blockquote><div><br>Yes, that sounds sane.<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> How about a read-only session property search.maxhits? We could specify that<br>> in order to be xesam compliant this value must be > 1000 or something - just<br>> so that apps wont have to sanity checks galore.
<br>Sounds good if used in addition to my suggestion above.</blockquote><div><br>What number of hits should be reported? The real number or just search.maxhits?<br><br>This also effects the signaling policy with HitsAdded signal. Should HitsAdded be emitted for items currently below the cut-off?
<br><br></div>Anyway it is not a huge issue. Firstly it mainly covers the case where a client submits a query and is slow to retrieve the hits - which seems like a really odd client policy. Also the typical search would not score millions of hits.
<br><br><br>Cheers,<br>Mikkel<br></div>