[Wasabi Proposal] Live search API

Wed Jan 24 04:59:25 PST 2007

On Sat, 20 Jan 2007 21:27:38 +0100
"Mikkel Kamstrup Erlandsen" <mikkel.kamstrup at gmail.com> wrote:

> 2007/1/19, Magnus Bergman <magnus.bergman at observer.net>:
> >
> > First some comments on the current draft[1]
> > """""""""""""""""""""""""""""""""""""""""""
> >
> >   As with the WasabiSearchSimple API[2] the session *is* the D-BUS
> >   connection. So there really doesn't need to be an explicit session
> >   object. It might be adequate to have one for the language
> > bindings, but then the same thing goes for the simple API.
> 
> I actually think the session should be explicit. Both language
> bindings and actual server implementations would have an easier life
> if it was explicit.

I don't object to that. But in that case I think the same goes for the
simple API. I assume sessions will map 1:1 to the dbus connection
(bindings might want to hide the dbus connection in the session object).

>   If the method GetMetadata should exist I think it would make more
> >   sense to make it belong to a document object, rename it
> > GetProperty and include it in the metadata storage API instead.
> 
> 
> Yes, it looks out of place in the search interface. There does
> however need to be a way to obtain the "expensive" hit metadata as
> discussed in the thread about the simple api.
> 
> >  And as I said before, I think it makes sense to treat queries and
> >   searches as different objects, which means renaming Query.Start to
> >   something like NewSearch. It also means that a query doesn't need
> > to belong to anything (like the session), it could exist
> > independently (unlike a search). I have left out possible functions
> > dealing with queries (like constructing an XML query from a simple
> > query string) since functions like that rather belong in a library.
> 
> 
> I follow you on the search/query separation. Having NewSearch()
> actually start the search gives some problems with the
> SearchSetProperty() since it doesn't make much sense to change
> properties on a running search. Spotlight has some similar methods
> and they restart the search if you invoke them. The reason I included
> a Query.Start - in current context Search.Start, was exactly that it
> should be possible to set properties on a Search/Query before it was
> actually run.

If it doesn't make sense to change properties on a running search, then
the function could be removed. But I think there might be cases then it
does. Every property set before the search starts are just included in
the XML query, right? So any function that sets properties for the
query can never do anything else than modify the query on the client
side. And I think such functions belong in a library.

>   Apart from ShowConfiguration(), all functions of the simple API
> seems
> >   to be in the live API as well.
> 
> 
> I moved simple/live.ShowCOnfiguration to a dbus interface
> org.freedesktop.search.ui.ShowConfiguration, togeteher with a new
> method ShowSearchTool. Please see
> http://wiki.freedesktop.org/wiki/WasabiUI for the api spec proposal.
> Sorry I did not find time to notify the list before now - spare my
> life :-)
> 
> ... So, would it be
> >   possible and desirable to define the simple API as a subset of the
> >   live API?
> 
> 
> I have ambivalent feelings on this issue. Let me outline pros and
> cons as I see them. I shall spare you my confusing thoughts and cut
> to the cheese:
> 
> Loose Idea for an Interface Merge:
> Have a boolean session property called "block". If it is true,
> GetHits() and CountHits() blocks until the desired info is available,
> removing the need for signals. If there are less hits than requested
> in by GetHits when the entire index have been searched, just return
> the found items.

Yes. In addition to the block property it might make sense to have a
"live" property as well (meaning the search will never finish). Just
because you don't want the live feature doesn't necessarily mean you
want it to block.

> The simples use case, retrieving uri and dc:title, would then look
> something like this (in pseudocode):
> 
> session = NewSession()
> SetProperty (session, "block", "true")
> SetProperty (session, "properties", "uri ; dc:title")
> 
> search = NewSearch (query_xml, session)  <-- search obj inherits
> requested props from the session
> hits = GetHits (search, 1000)
> <show hits>
> 
> count = HitCount (search)
> <print: showing 1000 of *count* hits>
> Close(search)
> Close(session)

Yes, that's pretty close to what I imagined too. In addition I think
"block" should be true by default (to make simple searching even
simpler). But what does "search obj inherits" mean?

> 
> The actual proposal
> > """""""""""""""""""
> >
> > SetProperty ( in s property , in s value )
> >
> >     Set a global (session) property. This method can be used for
> >     several things.
> >       o Setting default properties for Query objects.
> >       o Authentication/encryption
> >       o Generally be flexible for future needs
> >     * property: Name of the property.
> >     * value: New value for the property.
> >
> > GetProperty ( in s property , out s value)
> >
> >     Get the value of a global (session) property.
> >     * property: Name of the property.
> >     * value: Current value of the property.
> 
> 
> As noted above I still think we need a session handle. By using
> handles we could even Get/SetProperty to take both a session- or a
> search handle.  Like SetProperty(handle, prop, val).

A common SetProperty function requires some magic, which might make it
troublesome for some languages. It might be neat to have in some
languages (using overloading) but I object to having it at this level.

> 
> NewSearchFromXML ( in s query_xml , out s search )
> >
> >     Start a new search from an XML query.
> >     * query_xml: The query to execute.
> >     * search: A handle that is used to uniquely identify this
> > search.
> 
> 
> If the searches/queries can have properties I think we need a
> intermediate StartSearch() method. I can accept that if we decide to
> only have session properties then to start the search right away.

I don't really understand the need. This *is* the "StartSearch" method.
Every property set before the search starts is included in the query
(XML string). Or am I missing something?

> 
> SearchClose ( in s search)
> 
> 
> Check.
> 
> 
> 
> SearchSetProperty ( in s search , in s property , in s value)
> >
> > SearchGetProperty ( in s search , in s property , out s value)
> 
> 
> I have a few remarks related to this above.
> 
> 
> SearchCountHits ( in s search , out i count )
> 
> 
>  Check
> 
> SearchGetHitProperties ( in s search, in i offset, in i limit,
> >                          in as properties, out a{sa{sas}} response )
> 
> 
> I think it should be called GetHits. Why  list requested props here
> if you also do it in the Set*Property()? Why do we need an offset? In
> a live search I can't see any reason to re-request a given range of
> hits. Didn't we agree that the return value should be without maps
> and just arrays?

My idea of listing the requested props in Set*Property() was more of
limiting the set of properties that could be retrieved with this
function (but defaults to every possible prop), including the
expensive one(s). The typical case would be to call this function once
to get the basic props, and then perhaps again to get other (expensive)
ones. In order to be able to request expensive properties later, there
has to be a function like this in one way or another, even if it has
another name than this function.

Instead of using an offset there could be a function for "seeking" in
the search result, since you might want to go back and read some
other properties. I don't have any strong feelings about this, but I
think it's slightly easier (for the API user) to have an offset like
this.

I think it should be possible to re-request hits, since you actually
get it for free. The server has to remember them anyway, otherwise it
will be unable to tell then a document no longer matches the query,
right?

The real reason why I left the maps instead of writing it as arrays is
that I don't know the syntax, I'm perfectly happy with arrays.

About the name, I don't think it matters with these requirements. But
in one of the (commercial) search engine APIs I've used the hits were
also objects (so you had to first get the hit from the search and then
the property from the hit). The benefit from this approach is that the
hit object can have a direct pointer to the query that caused it
(because a search could be constructed from more than one query). And
some quite complicated things related to highlighting. Imagine you
extract and index the text from a word document, then you want to view
it as a highlighted PDF-document. For this to work each hit needs some
extra data (I wont go into detail). But these features will never be a
part of this API so the naming doesn't matter as much I guess. But that
was my reason for choosing the name.

> signal SearchHitsAdded ( s search , i count)
> >
> 
> 
> > signal SearchHitsRemoved ( s search , ai offsets )
> 
> 
> 
> signal HitsHitsModified ( s search , ai offsets )
> 
> 
> Is this why you want to be able to refetch pages in GetHitProperties?
> If I recall correct this signal is why I included the GetMetadata
> method in the first place.

Well, sort of. I think we need the functionality of what you called
GetMetadata. The question is it all should be done by GetHitProperties,
or if it's better to keep GetHitProperties simple and have an
additional function as well.

> How do you cater for snippets? If you again want to use the
> GetHitProperties method I can see the solution, but I must say that
> it appears inelegant to use GetHitPropeties like this - for results,
> updates, and snippets.

Using GetHitProperties was what I intended, yes. To me it appears
elegant, but might very well just be me. I'm willing to consider other
ideas.