[Xesam] Why is vendor.maxhits read-only?

Tue Dec 18 02:48:27 PST 2007

On 18/12/2007, Anders Rune Jensen <anders at iola.dk> wrote:
> On Dec 17, 2007 10:47 PM, Mikkel Kamstrup Erlandsen
> <mikkel.kamstrup at gmail.com> wrote:
> > On 17/12/2007, Anders Rune Jensen <anders at iola.dk> wrote:
> > > On Dec 17, 2007 6:56 PM, Mikkel Kamstrup Erlandsen
> > > <mikkel.kamstrup at gmail.com> wrote:
> > > > On 17/12/2007, Anders Rune Jensen <anders at iola.dk> wrote:
> > > > > Hi
> > > > >
> > > > > I was wondering why vendor.maxhits is read-only? Beagle can natively
> > > > > set this, so it would be really nice to be able to set this using
> > > > > Xesam as well.
> > > > >
> > > > > Thanks
> > > >
> > > > No this does not make sense to be writable. Perhaps it is because the
> > > > explanation is bad. Here's another try:
> > > >
> > > > vendor.maxhits is a hard implementation level on the maximum number of
> > > > hits returnable. If you write a Lucene based indexer this will be your
> > > > JVMs Integer.MAXINT other indexing frameworks might set other limits
> > > > (or none).
> > > >
> > > > An example by the hand is a Google query. You can maximally retrieve
> > > > 10.000 docs from a Google query - try it yourself (this has to do with
> > > > the distributed nature of the Google search engine - it is hard to get
> > > > rankings correct if you allow arbitrarily many docs to be fetched).
> > > >
> > > > The fucntionality you describe is also easily implemented on the
> > > > client side. I did this in xesam-tools' xesam.ui.HitPagerModel for
> > > > example..
> > >
> > > Ok, maybe I misunderstand this completely but it isn't always easy to
> > > get all the details from source in a jif (even if it's Python :-)).
> >
> > No not necessarily. I think you are thinking about databases, in
> > (most) database systems it is almost free to look up the values of the
> > fields. This is not necessarily so in a Lucene index for instance.
> >
> > > So what you suggest is that the cap is set very high on the number of
> > > results returned from the xesam backend and then you just disregard
> > > what you don't need on the client side? Is this what you meant or am I
> > > misunderstanding something?
> >
> > Hmmm, sounds like it :-) Here's a cap of a session as it could very
> > well transpire:
> >
> > 1) A client start a search 'sh' an a session 'ses'.
> >
> > 2) The server performs a query over a few indexes and find the first
> > batch of hits with N hits and emits HitsAdded(sh,N). In a Lucene world
> > the server would now hold doc-ids which are just integer handles for
> > each hit.
> >
> > 3) Client detects the HitsAdded(sh, N) signal and request data for M <
> > N hits, by calling GetHits(sh, M)
> >
> > 4) The server collects the hit data for hits 0..M and returns it
> >
> > 5) Client receive hit data for hits 0..M and displays it to the user
> >
> > 6) Server finds Q more hits  and emits HitsAdded(sh, Q)
> >
> > 7) Client does not need more hits and does nothing
> >
> > 8) The server waits on more requests for its or until search or
> > session is closed
> >
> > So the client does not "disregard what it doesn't need", but instead
> > only requests data for what it needs.
>
> Ok. I see now. :-)
>
> Thanks for the explaination.
>
> >> I could really see the usefulness of a flag to tell the backend how
> >> many results I'm interested it.
> >
> > Yes, I can certainly see the use. We already discussed this a good
> > while back on xdg, and it was turned down[1].
> >
> > Anyways it is a more complicated matter than it might appear - as
> > pointed out in [1] what if I request a batch size of 100 and the
> > server finds 99 hits in the first go. It might very well be impossible
> > for the server to tell if more hits are inbound, or it should just
> > fire HitsAdded(99)...
>
> Right, but you just move that head-ache to the user instead of the
> server. The user has no way of knowing this, while the server might.
> Maybe a NoMoreHits() signal could do this and we just keep the
> interface? Together with this, a signal for suggesting the number of
> results one wants to receive would be very nice to have. In this case
> the server could make the optimization of just returning all the
> results in one dbus message instead of several to improve the
> performance. Another thing that would be very nice.

Indeed. It is really just a weighing of benefits/drawbacks. The reason
for putting it in a client lib is based on the assumption that there
will be more server implementations that client lib implementations.
The reason I think this will be the case is that we might see several
applications spawning xesam APIs over the coming years - fx the
Mugshot/GnomeOnline client could expose a xesam interface to search
users/feeds/groups stuff, your imagination sets the limit.

> So far I can work around all of this, but it really would be much
> nicer to have those two interfaces.

Would it not be just as good had you had a xesam client lib (exposing
these features) with GObjects for C#?

> Btw. I've been implementing xesam support in nemo using the
> beagle-xesam-adaptor and I must say that so far xesam has been
> wonderful to work with. It's syntax is very close to lists in lisp (it
> is after all xml) so it's really easy to grok.

Good to hear! Although it terrifies me that you say it is close to Lisp! ;-P

Cheers,
Mikkel