[Xesam] Why is vendor.maxhits read-only?

Tue Dec 18 01:42:38 PST 2007

On Dec 17, 2007 10:47 PM, Mikkel Kamstrup Erlandsen
<mikkel.kamstrup at gmail.com> wrote:
> On 17/12/2007, Anders Rune Jensen <anders at iola.dk> wrote:
> > On Dec 17, 2007 6:56 PM, Mikkel Kamstrup Erlandsen
> > <mikkel.kamstrup at gmail.com> wrote:
> > > On 17/12/2007, Anders Rune Jensen <anders at iola.dk> wrote:
> > > > Hi
> > > >
> > > > I was wondering why vendor.maxhits is read-only? Beagle can natively
> > > > set this, so it would be really nice to be able to set this using
> > > > Xesam as well.
> > > >
> > > > Thanks
> > >
> > > No this does not make sense to be writable. Perhaps it is because the
> > > explanation is bad. Here's another try:
> > >
> > > vendor.maxhits is a hard implementation level on the maximum number of
> > > hits returnable. If you write a Lucene based indexer this will be your
> > > JVMs Integer.MAXINT other indexing frameworks might set other limits
> > > (or none).
> > >
> > > An example by the hand is a Google query. You can maximally retrieve
> > > 10.000 docs from a Google query - try it yourself (this has to do with
> > > the distributed nature of the Google search engine - it is hard to get
> > > rankings correct if you allow arbitrarily many docs to be fetched).
> > >
> > > The fucntionality you describe is also easily implemented on the
> > > client side. I did this in xesam-tools' xesam.ui.HitPagerModel for
> > > example..
> >
> > Ok, maybe I misunderstand this completely but it isn't always easy to
> > get all the details from source in a jif (even if it's Python :-)).
>
> No not necessarily. I think you are thinking about databases, in
> (most) database systems it is almost free to look up the values of the
> fields. This is not necessarily so in a Lucene index for instance.
>
> > So what you suggest is that the cap is set very high on the number of
> > results returned from the xesam backend and then you just disregard
> > what you don't need on the client side? Is this what you meant or am I
> > misunderstanding something?
>
> Hmmm, sounds like it :-) Here's a cap of a session as it could very
> well transpire:
>
> 1) A client start a search 'sh' an a session 'ses'.
>
> 2) The server performs a query over a few indexes and find the first
> batch of hits with N hits and emits HitsAdded(sh,N). In a Lucene world
> the server would now hold doc-ids which are just integer handles for
> each hit.
>
> 3) Client detects the HitsAdded(sh, N) signal and request data for M <
> N hits, by calling GetHits(sh, M)
>
> 4) The server collects the hit data for hits 0..M and returns it
>
> 5) Client receive hit data for hits 0..M and displays it to the user
>
> 6) Server finds Q more hits  and emits HitsAdded(sh, Q)
>
> 7) Client does not need more hits and does nothing
>
> 8) The server waits on more requests for its or until search or
> session is closed
>
> So the client does not "disregard what it doesn't need", but instead
> only requests data for what it needs.

Ok. I see now. :-)

Thanks for the explaination.

>> I could really see the usefulness of a flag to tell the backend how
>> many results I'm interested it.
>
> Yes, I can certainly see the use. We already discussed this a good
> while back on xdg, and it was turned down[1].
>
> Anyways it is a more complicated matter than it might appear - as
> pointed out in [1] what if I request a batch size of 100 and the
> server finds 99 hits in the first go. It might very well be impossible
> for the server to tell if more hits are inbound, or it should just
> fire HitsAdded(99)...

Right, but you just move that head-ache to the user instead of the
server. The user has no way of knowing this, while the server might.
Maybe a NoMoreHits() signal could do this and we just keep the
interface? Together with this, a signal for suggesting the number of
results one wants to receive would be very nice to have. In this case
the server could make the optimization of just returning all the
results in one dbus message instead of several to improve the
performance. Another thing that would be very nice.

So far I can work around all of this, but it really would be much
nicer to have those two interfaces.

Btw. I've been implementing xesam support in nemo using the
beagle-xesam-adaptor and I must say that so far xesam has been
wonderful to work with. It's syntax is very close to lists in lisp (it
is after all xml) so it's really easy to grok.

-- 
Anders Rune Jensen
http://people.iola.dk/anders/