[XESAM] Spec update proposals

Fri Jun 22 11:37:49 PDT 2007

2007/6/22, Jos van den Oever <jvdoever at gmail.com>:
> 2007/6/17, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> > 2007/6/17, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> >
> > > 2007/6/16, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com>:
> > >
> > > > Hi all,
> > > >
> > > > While hacking on xesam-tools[1] I have struck a few problems in the
> > current spec and I think we should settle them asap.
> > > >
> > > > Please have a look at
> > http://wiki.freedesktop.org/wiki/XesamSearchUpdates
> > and gimme some feedback.
> > >
> > >
> > > It was just pointed out to me that the query schema in the current form
> > does not have extendedSelectionTypes in the selectionTypes - this makes
> > regExp and proximity selectors not allowed by the schema. I've updated
> > http://wiki.freedesktop.org/wiki/XesamSearchUpdates
> >  with a proposal 5.
> >
> >
> > More proposed updates. I've added a point 6 to the list at
> > http://wiki.freedesktop.org/wiki/XesamSearchUpdates
> > . It has to do with an inherent race condition in the
> > search  interface. When you fire a search via NewSearch the server might
> > start firing HitsAdded signals before you have a chance to connect to them -
> > in fact it might start firing these signals before the call to NewSearch
> > returns! I think this is an optimization we need.
> >
> > The proposed solution is to add a StartSearch() method to the interface.
> How about using the query as the search key? The query is rarely very
> large and i do not think it will cause much overhead. In fact, i
> measured it.
> Here are the results of running 10000 dbus queries with a string as
> argument. The server is c++, the caller is python.
> querylength    time
>
> query length  time
> 10            2.980s
> 30            2.971s
> 100           2.995s
> 300           3.032s
> 1000          3.127s
> 3000          3.364s
> 10000         4.508s
> 30000         7.732s
>
> As you can see, the time stays about constant until the query becomes
> longer than 1000 characters. At 3000 characters we see 10% loss in
> speed. 3000 characters of query is huge. Still only at about 20.000
> characters does the dbus performance halve. Using StartQuery() always
> halves the dbus performance!
>
> Using the query as key is a bit slower for huge queries. It takes a
> bit more memory on the server, but in general it will be faster and
> most importantly will be simpler for the user.
>
> It's unintuitive for us hackers to do this in such a simple way,
> because it feels like wasting resources. But in fact this is the most
> efficient solution.
>

Really cool that you stepped up an did this. Thanks a bunch :-)

A few questions regarding your test:
 - Do you pass the string as argument and return it again, or do  you
just pass it as argument and return void?
 - Just to be 100% clear. Are we talking UTF-8 strings of N characters
or N bytes?

Regarding the search spec - I'm not outright against using the query's
string representation for the key, but it does have a few consequences
we should bear in mind.

It is for example likely that apps use some kind of toolkit to build
queries in an object oriented manner - never seeing the string
representation.

If this query construction API is integrated with the actual client
side xesam bindings so that you do fx:

    search = client.newSearch (new UserQuery("hello world"))

there is no problem. But the point is that it forces me to couple the
query-construction toolkit with the client side xesam search toolkit.
While this is probably the easiest way anyway, it is still a price we
accept.

Historical Note:
Using the query string as search handle was in fact one of the first
proposals for the xesam search spec. I think we better dig out why it
was rejected then...

Cheers,
Mikkel