[Wasabi] FOSDEM conclusions - finalizing the search spec

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Wed Mar 14 10:36:44 EET 2007


2007/3/13, jamie <jamiemcc at blueyonder.co.uk>:
>
> On Tue, 2007-03-13 at 20:05 +0100, Mikkel Kamstrup Erlandsen wrote:
> > 2007/3/13, jamie <jamiemcc at blueyonder.co.uk>:
> >         On Tue, 2007-03-13 at 21:56 +0800, Fabrice Colin wrote:
> >         > On 3/13/07, Mikkel Kamstrup Erlandsen
> >         <mikkel.kamstrup at gmail.com> wrote:
> >         > > Please give http://freedesktop.org/wiki/WasabiSearchLive a
> >         > > good look before we set this in stone. It is the last call
> >         if you have any
> >         > > objections - I really mean it this time. Anything from
> >         critisizing the
> >         > > fundamental structure down to nitpicking on the session
> >         property names is
> >         > > welcome.
> >         > >
> >         > There's a couple of things I am not clear about :
> >         >
> >         > - "search.blocking : Whether or not calls will block until
> >         the
> >         > requested items are available."
> >         > Do you really mean this ? Should NewSearch block ad vitam
> >         eternam if
> >         > there are no
> >         > results for the given query ? ;-)
> >         >
> >         > - "CountHits (in s search, out i count) Returns the current
> >         number of
> >         > found hits. If
> >         > search.blocking==true this call blocks until the index has
> >         been fully searched."
> >         > Shouldn't this read "if search.live==false this call
> >         blocks..." ?
> >         >
> >         > - "These signals are only used if the session property
> >         search.blocking is true."
> >         > Again, shouldn't it be "if search.live is true" ?
> >         >
> >         > - GetState
> >         > if the first string is "FULL_INDEX", shouldn't the second
> >         string
> >         > always be "100" ?
> >         >
> >         > - signal HitsAdded
> >         > is count the number of new hits, or the new number of hits ?
> >         I assume the latter
> >         > since the example at the bottom shows a call to
> >         "GetHits(session, count)" after
> >         > receiving "HitsAdded(count)".
> >         >
> >         > - signal StateChanged
> >         > An example would be welcome here. For indexers that monitor
> >         sources, eg monitor
> >         > the filesystem with inotify, the state will switch between
> >         UPDATING
> >         > and IDLE and/or FULL_INDEX very often. Is the indexer
> >         supposed to send
> >         > a signal every time ?
> >         >
> >         > - properties and field names
> >         > You may want to clarify what differences, if any, there are
> >         between
> >         > properties and
> >         > field names.
> >         >
> >
> >         On top of all that if this API were to be usable in our
> >         tracker GUI we
> >         would need the following:
> >
> >         1) in tracker the service type being searched is mandatory - I
> >         would
> >         prefer it to be a session property or even better a param in
> >         the
> >         NewSearch method. If it remains part of the xml then that bit
> >         should be
> >         mandatory in the xml schema/dtd
> >
> > Having it in a session property seems really odd, since it seems a
> > natural part of the query (ie. the query also contains "what to
> > query"). Putting it in a param to NewSearch also is not biggest desire
> > since the current approach where you only need a session and a query
> > to start a search is very clean. Currently a query is "self-contained"
> > - doesn't require anything else to be runnable, if it required
> > additional info to be useful, then that is a drawback (in my head
> > atleast).
> >
> > Making "type" a mandatory attribute on the query element could be fine
> > by me. I just fail to see the problem in defaulting to all. It would
> > not only be slow, but also undefined in which objects you search. But
> > why not allow it for convenience? It wouldn't require much
> > documentation to explain this.
> >
> >
> >         2) GetHits/GetHitData
> >
> >         There are two use cases as far as tracker goes:
> >
> >         a ) if i need metadata for all hits then it will always be
> >         quicker to
> >         have them in GetHits
> >
> >         b) for things like our tile we need to fetch extra metadata
> >         for a single
> >         hit so GetHitData would only ever be used for a single hit not
> >         multiple
> >         ones - would be easier for us if that was changed to:
> >
> >         GetHitData (in s ID, in as fields out av values)
> >
> >         (I cant think of a single case where we would want to get
> >         metadata
> >         *separately* for more than one hit at a time)
> >
> > Well, the trick is that GetHitData is also used when you receive a
> > HitsModified signal. Then you re-fetch  metadata for all the hit-ids.
> > Consider the case where I move a directory and I have 50 files inside
> > it all giving me matches (this will fire a HitsModified since moving
> > files just amount to changing the uri field of the hit).
> >
> >
> >         3) for separate snippets we would like to include a max length
> >         of the
> >         returned snippet so I'm not sure if a dedicated call for this
> >         would be
> >         better? Might not matter for a general purpose API like
> >         Wasabi?
> >
> >
> > Well, generally Wasabi is designed around "sane defaults" (in many
> > places atleast). Wouldn't it suffice to return a "sanely sized"
> > snippet and let the UI trim it to an appropriate size?
>
> would not be easy for an app though (think of the case when you have
> multiple search terms highlighted in the snippet)


Good point.

I am only suggesting these because they are in important in tracker -
> not sure if they matter in Wasabi but could do?


We could put the preferred snippet length in a session property. Would that
suffice? You would not be able to set it per-search, but I am not sure that
is necessary anyway..?

Another thing we do in T-S-T, is get hit count grouped by service (would
> be slower to get a hit count for each type individually)


I assume you use the Tracker method[1] GetHitCount(in s service, in s
search_text, out i count) for this.

If you want the same functionality in wasabi you would probably have to use
a main session and a parallel "counter" session with hit.fields=[]. Then
each time a new hit type is found in the main session you fire of a query on
that type only in the counter session and use that to get the type specific
hit count.

Note that this sort of counting is really just a simple version of more
general information clustering. And if you want to do a more complete
clustering you will probably not be able to get around firing of parallel
searches anyway.

I leave it up to you to decide whether these are important enough to
> warrant wasabi support :)
>

Eeek, I'm not sure I got the balls for that :-) I would like to hear what
others think before I make any decisions.

Cheers,
Mikkel

[1]:
http://svn.gnome.org/viewcvs/tracker/trunk/data/tracker-introspect.xml?revision=530
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070314/09cd6fad/attachment.htm 


More information about the xdg mailing list