2007/7/20, Jos van den Oever <<a href="mailto:jvdoever@gmail.com">jvdoever@gmail.com</a>>:<div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> 2007/7/20, Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>>: > > I completely agree on all suggestions. > > One more suggestion: the minimal interval between result signals > > should be sane or settable. > > Valid point. To avoid signal spamming I take it. How about a session > property  hit.batch.size that is an integer determining how many hits the > server should collect before emitting HitsAdded. In case the entire index > has been searched but < hit.batch.size hits has been found HitsAdded should > be emitted(num_hits) right before SearchDone. I would prefer setting this in terms of milliseconds, not number of hits. Imagine you have the batch size at 100 and hits 1-99 are there in 1 ms and hit #100 takes 20 seconds. That would not be so nice. If you say that the time between signals must be at least 100 ms, you solve the problem more elegantly.</blockquote><div> How about not setting it at all and just let the server side implementation decide the best strategy? I guess the value of such property (whether in millis or hit count) is highly implementation dependent. You bring up the slowness problem, but there is also the flooding problem fx searches with 1.000.000 million hits... I think only the server has a reasonable chance of guessing the right strategy. The client is basically in the dark here. I have reversed my opinion - I say "keep the logic server side". </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> > On the topic of remembering the hits. > > In ideal world, the server could be clever and get the right file from > > the hit number. In reality, this is quite hard. Atm the server should > > keep a vector with uris internally. I think we should allow the server > > to have a sane maximum of hits that are retrievable. E.b. CountHits > > might return 1 million, but you would only be able to retrieve the > > first 100k. > > This makes sense given that  the scoring algorithms on servers are good > enough. But judging by the extraordinary amount of talent we have in the > server-side dev camp this is no problem of course :-) The problem is not in the scoring algos, but in the changing data on disk. If you do not get the list of uris at once, it may change due to changes on the disk. I say we should ignore this problem as long as the uri has not yet been requested and say that the result list is not fixed until it is actually requested.</blockquote><div> Yes, that sounds sane.  </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> > How about a read-only session property search.maxhits? We could specify that > in order to be xesam compliant this value must be > 1000 or something - just > so that apps wont have to sanity checks galore. Sounds good if used in addition to my suggestion above.</blockquote><div> What number of hits should be reported? The real number or just search.maxhits? This also effects the signaling policy with HitsAdded signal. Should HitsAdded be emitted for items currently below the cut-off? </div>Anyway it is not a huge issue. Firstly it mainly covers the case where a client submits a query and is slow to retrieve the hits - which seems like a really odd client policy. Also the typical search would not score millions of hits. Cheers, Mikkel </div>