[Wasabi] FOSDEM conclusions - finalizing the search spec

Tue Mar 13 21:05:05 EET 2007

2007/3/13, jamie <jamiemcc at blueyonder.co.uk>:
>
> On Tue, 2007-03-13 at 21:56 +0800, Fabrice Colin wrote:
> > On 3/13/07, Mikkel Kamstrup Erlandsen <mikkel.kamstrup at gmail.com> wrote:
> > > Please give http://freedesktop.org/wiki/WasabiSearchLive a
> > > good look before we set this in stone. It is the last call if you have
> any
> > > objections - I really mean it this time. Anything from critisizing the
> > > fundamental structure down to nitpicking on the session property names
> is
> > > welcome.
> > >
> > There's a couple of things I am not clear about :
> >
> > - "search.blocking : Whether or not calls will block until the
> > requested items are available."
> > Do you really mean this ? Should NewSearch block ad vitam eternam if
> > there are no
> > results for the given query ? ;-)
> >
> > - "CountHits (in s search, out i count) Returns the current number of
> > found hits. If
> > search.blocking==true this call blocks until the index has been fully
> searched."
> > Shouldn't this read "if search.live==false this call blocks..." ?
> >
> > - "These signals are only used if the session property search.blockingis true."
> > Again, shouldn't it be "if search.live is true" ?
> >
> > - GetState
> > if the first string is "FULL_INDEX", shouldn't the second string
> > always be "100" ?
> >
> > - signal HitsAdded
> > is count the number of new hits, or the new number of hits ? I assume
> the latter
> > since the example at the bottom shows a call to "GetHits(session,
> count)" after
> > receiving "HitsAdded(count)".
> >
> > - signal StateChanged
> > An example would be welcome here. For indexers that monitor sources, eg
> monitor
> > the filesystem with inotify, the state will switch between UPDATING
> > and IDLE and/or FULL_INDEX very often. Is the indexer supposed to send
> > a signal every time ?
> >
> > - properties and field names
> > You may want to clarify what differences, if any, there are between
> > properties and
> > field names.
> >
>
> On top of all that if this API were to be usable in our tracker GUI we
> would need the following:
>
> 1) in tracker the service type being searched is mandatory - I would
> prefer it to be a session property or even better a param in the
> NewSearch method. If it remains part of the xml then that bit should be
> mandatory in the xml schema/dtd

Having it in a session property seems really odd, since it seems a natural
part of the query (ie. the query also contains "what to query"). Putting it
in a param to NewSearch also is not biggest desire since the current
approach where you only need a session and a query to start a search is very
clean. Currently a query is "self-contained" - doesn't require anything else
to be runnable, if it required additional info to be useful, then that is a
drawback (in my head atleast).

Making "type" a mandatory attribute on the query element could be fine by
me. I just fail to see the problem in defaulting to all. It would not only
be slow, but also undefined in which objects you search. But why not allow
it for convenience? It wouldn't require much documentation to explain this.

2) GetHits/GetHitData
>
> There are two use cases as far as tracker goes:
>
> a ) if i need metadata for all hits then it will always be quicker to
> have them in GetHits
>
> b) for things like our tile we need to fetch extra metadata for a single
> hit so GetHitData would only ever be used for a single hit not multiple
> ones - would be easier for us if that was changed to:
>
> GetHitData (in s ID, in as fields out av values)
>
> (I cant think of a single case where we would want to get metadata
> *separately* for more than one hit at a time)

Well, the trick is that GetHitData is also used when you receive a
HitsModified signal. Then you re-fetch  metadata for all the hit-ids.
Consider the case where I move a directory and I have 50 files inside it all
giving me matches (this will fire a HitsModified since moving files just
amount to changing the uri field of the hit).

3) for separate snippets we would like to include a max length of the
> returned snippet so I'm not sure if a dedicated call for this would be
> better? Might not matter for a general purpose API like Wasabi?

Well, generally Wasabi is designed around "sane defaults" (in many places
atleast). Wouldn't it suffice to return a "sanely sized" snippet and let the
UI trim it to an appropriate size?

I dont think we can freeze the api until we have a working
> implementation (which may uncover the need for more changes) - I plan on
> implementing it in tracker next month.

I agree, and that's also why I have not  pressed harder on this.  I'm
working on some Python gobject bindings+tools to help test Wasabi services.
They will also include a dummy server implementation. Having a real service
to search against would be really nice ofcourse :-)

things still blocking implementation:
>
> 1) list of applicable metadata names - I would suggest a mandatory set
> (IE metadata supported by all) and an optional set (this would always
> return NULL if not supported)
>
> 2) list of applicable service types (emails, files, conversations etc)

I deliberately didn't push this debate much lately because I wanted to hear
what the Nepomuk guys had to say about this. Now that I know that they are
open to having the Wasabi metadata fields map to their fully sematic types I
think it should be safe to move on. - But yes, we are really starting to
need this.

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070313/4084c613/attachment.htm