[XESAM] API simplification?
Mikkel Kamstrup Erlandsen
mikkel.kamstrup at gmail.com
Sat Jul 21 16:50:33 PDT 2007
2007/7/20, Jos van den Oever <jvdoever at gmail.com>:
> 2007/7/20, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> > > I completely agree on all suggestions.
> > > One more suggestion: the minimal interval between result signals
> > > should be sane or settable.
> > Valid point. To avoid signal spamming I take it. How about a session
> > property hit.batch.size that is an integer determining how many hits
> > server should collect before emitting HitsAdded. In case the entire
> > has been searched but < hit.batch.size hits has been found HitsAdded
> > be emitted(num_hits) right before SearchDone.
> I would prefer setting this in terms of milliseconds, not number of
> hits. Imagine you have the batch size at 100 and hits 1-99 are there
> in 1 ms and hit #100 takes 20 seconds. That would not be so nice. If
> you say that the time between signals must be at least 100 ms, you
> solve the problem more elegantly.
How about not setting it at all and just let the server side implementation
decide the best strategy? I guess the value of such property (whether in
millis or hit count) is highly implementation dependent.
You bring up the slowness problem, but there is also the flooding problem fx
searches with 1.000.000 million hits...
I think only the server has a reasonable chance of guessing the right
strategy. The client is basically in the dark here. I have reversed my
opinion - I say "keep the logic server side".
> > On the topic of remembering the hits.
> > > In ideal world, the server could be clever and get the right file from
> > > the hit number. In reality, this is quite hard. Atm the server should
> > > keep a vector with uris internally. I think we should allow the server
> > > to have a sane maximum of hits that are retrievable. E.b. CountHits
> > > might return 1 million, but you would only be able to retrieve the
> > > first 100k.
> > This makes sense given that the scoring algorithms on servers are good
> > enough. But judging by the extraordinary amount of talent we have in the
> > server-side dev camp this is no problem of course :-)
> The problem is not in the scoring algos, but in the changing data on
> disk. If you do not get the list of uris at once, it may change due to
> changes on the disk. I say we should ignore this problem as long as
> the uri has not yet been requested and say that the result list is not
> fixed until it is actually requested.
Yes, that sounds sane.
> How about a read-only session property search.maxhits? We could specify
> > in order to be xesam compliant this value must be > 1000 or something -
> > so that apps wont have to sanity checks galore.
> Sounds good if used in addition to my suggestion above.
What number of hits should be reported? The real number or just
This also effects the signaling policy with HitsAdded signal. Should
HitsAdded be emitted for items currently below the cut-off?
Anyway it is not a huge issue. Firstly it mainly covers the case where a
client submits a query and is slow to retrieve the hits - which seems like a
really odd client policy. Also the typical search would not score millions
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the xdg