[Wasabi Proposal] Live search API

Thu Jan 25 00:40:54 PST 2007

I updated the live search proposal on
http://wiki.freedesktop.org/wiki/WasabiSearchLive with a unified one (of
simple and live).

2007/1/24, Magnus Bergman <magnus.bergman at observer.net>:
>
> On Sat, 20 Jan 2007 21:27:38 +0100
> "Mikkel Kamstrup Erlandsen" <mikkel.kamstrup at gmail.com> wrote:
>
> > 2007/1/19, Magnus Bergman <magnus.bergman at observer.net>:
> > >
> > > First some comments on the current draft[1]
> > > """""""""""""""""""""""""""""""""""""""""""
> > >
> > >   As with the WasabiSearchSimple API[2] the session *is* the D-BUS
> > >   connection. So there really doesn't need to be an explicit session
> > >   object. It might be adequate to have one for the language
> > > bindings, but then the same thing goes for the simple API.
> >
> > I actually think the session should be explicit. Both language
> > bindings and actual server implementations would have an easier life
> > if it was explicit.
>
> I don't object to that. But in that case I think the same goes for the
> simple API. I assume sessions will map 1:1 to the dbus connection
> (bindings might want to hide the dbus connection in the session object).

Ok, good.

>   If the method GetMetadata should exist I think it would make more
> > >   sense to make it belong to a document object, rename it
> > > GetProperty and include it in the metadata storage API instead.
> >
> >
> > Yes, it looks out of place in the search interface. There does
> > however need to be a way to obtain the "expensive" hit metadata as
> > discussed in the thread about the simple api.
> >
> > >  And as I said before, I think it makes sense to treat queries and
> > >   searches as different objects, which means renaming Query.Start to
> > >   something like NewSearch. It also means that a query doesn't need
> > > to belong to anything (like the session), it could exist
> > > independently (unlike a search). I have left out possible functions
> > > dealing with queries (like constructing an XML query from a simple
> > > query string) since functions like that rather belong in a library.
> >
> >
> > I follow you on the search/query separation. Having NewSearch()
> > actually start the search gives some problems with the
> > SearchSetProperty() since it doesn't make much sense to change
> > properties on a running search. Spotlight has some similar methods
> > and they restart the search if you invoke them. The reason I included
> > a Query.Start - in current context Search.Start, was exactly that it
> > should be possible to set properties on a Search/Query before it was
> > actually run.
>
> If it doesn't make sense to change properties on a running search, then
> the function could be removed. But I think there might be cases then it
> does. Every property set before the search starts are just included in
> the XML query, right? So any function that sets properties for the
> query can never do anything else than modify the query on the client
> side. And I think such functions belong in a library.

I removed the method from the search object. Session properties are not
included in the query xml, but are set on the server separately.

>   Apart from ShowConfiguration(), all functions of the simple API
> > seems
> > >   to be in the live API as well.
> >
> >
> > I moved simple/live.ShowCOnfiguration to a dbus interface
> > org.freedesktop.search.ui.ShowConfiguration, togeteher with a new
> > method ShowSearchTool. Please see
> > http://wiki.freedesktop.org/wiki/WasabiUI for the api spec proposal.
> > Sorry I did not find time to notify the list before now - spare my
> > life :-)
> >
> > ... So, would it be
> > >   possible and desirable to define the simple API as a subset of the
> > >   live API?
> >
> >
> > I have ambivalent feelings on this issue. Let me outline pros and
> > cons as I see them. I shall spare you my confusing thoughts and cut
> > to the cheese:
> >
> > Loose Idea for an Interface Merge:
> > Have a boolean session property called "block". If it is true,
> > GetHits() and CountHits() blocks until the desired info is available,
> > removing the need for signals. If there are less hits than requested
> > in by GetHits when the entire index have been searched, just return
> > the found items.
>
> Yes. In addition to the block property it might make sense to have a
> "live" property as well (meaning the search will never finish). Just
> because you don't want the live feature doesn't necessarily mean you
> want it to block.

Yes that makes sense. I included it in the updated suggestion.

> The simples use case, retrieving uri and dc:title, would then look
> > something like this (in pseudocode):
> >
> > session = NewSession()
> > SetProperty (session, "block", "true")
> > SetProperty (session, "properties", "uri ; dc:title")
> >
> > search = NewSearch (query_xml, session)  <-- search obj inherits
> > requested props from the session
> > hits = GetHits (search, 1000)
> > <show hits>
> >
> > count = HitCount (search)
> > <print: showing 1000 of *count* hits>
> > Close(search)
> > Close(session)
>
> Yes, that's pretty close to what I imagined too. In addition I think
> "block" should be true by default (to make simple searching even
> simpler). But what does "search obj inherits" mean?

Agree on the "block" thing. I meant it as a reference to the (now removed)
Search.Set/GetProperty method. When you create a new search object all
properties from the session are "inherited".

>
> > The actual proposal
> > > """""""""""""""""""
> > >
> > > SetProperty ( in s property , in s value )
> > >
> > >     Set a global (session) property. This method can be used for
> > >     several things.
> > >       o Setting default properties for Query objects.
> > >       o Authentication/encryption
> > >       o Generally be flexible for future needs
> > >     * property: Name of the property.
> > >     * value: New value for the property.
> > >
> > > GetProperty ( in s property , out s value)
> > >
> > >     Get the value of a global (session) property.
> > >     * property: Name of the property.
> > >     * value: Current value of the property.
> >
> >
> > As noted above I still think we need a session handle. By using
> > handles we could even Get/SetProperty to take both a session- or a
> > search handle.  Like SetProperty(handle, prop, val).
>
> A common SetProperty function requires some magic, which might make it
> troublesome for some languages. It might be neat to have in some
> languages (using overloading) but I object to having it at this level.

Agreed. Let's just have properties on the session only. Unless someone comes
up with a real good example where something makes sense on the search only.

>
> > NewSearchFromXML ( in s query_xml , out s search )
> > >
> > >     Start a new search from an XML query.
> > >     * query_xml: The query to execute.
> > >     * search: A handle that is used to uniquely identify this
> > > search.
> >
> >
> > If the searches/queries can have properties I think we need a
> > intermediate StartSearch() method. I can accept that if we decide to
> > only have session properties then to start the search right away.
>
> I don't really understand the need. This *is* the "StartSearch" method.
> Every property set before the search starts is included in the query
> (XML string). Or am I missing something?

The updated proposal uses Search() to both create and start the search.

>
> > SearchClose ( in s search)
> >
> >
> > Check.
> >
> >
> >
> > SearchSetProperty ( in s search , in s property , in s value)
> > >
> > > SearchGetProperty ( in s search , in s property , out s value)
> >
> >
> > I have a few remarks related to this above.
> >
> >
> > SearchCountHits ( in s search , out i count )
> >
> >
> >  Check
> >
> > SearchGetHitProperties ( in s search, in i offset, in i limit,
> > >                          in as properties, out a{sa{sas}} response )
> >
> >
> > I think it should be called GetHits. Why  list requested props here
> > if you also do it in the Set*Property()? Why do we need an offset? In
> > a live search I can't see any reason to re-request a given range of
> > hits. Didn't we agree that the return value should be without maps
> > and just arrays?
>
> My idea of listing the requested props in Set*Property() was more of
> limiting the set of properties that could be retrieved with this
> function (but defaults to every possible prop), including the
> expensive one(s). The typical case would be to call this function once
> to get the basic props, and then perhaps again to get other (expensive)
> ones. In order to be able to request expensive properties later, there
> has to be a function like this in one way or another, even if it has
> another name than this function.
>
> Instead of using an offset there could be a function for "seeking" in
> the search result, since you might want to go back and read some
> other properties. I don't have any strong feelings about this, but I
> think it's slightly easier (for the API user) to have an offset like
> this.
>
> I think it should be possible to re-request hits, since you actually
> get it for free. The server has to remember them anyway, otherwise it
> will be unable to tell then a document no longer matches the query,
> right?

You can easily re-request hits with the updated proposal. Just GetHitData()
with hit ids and wanted props.

The real reason why I left the maps instead of writing it as arrays is
> that I don't know the syntax, I'm perfectly happy with arrays.
>
> About the name, I don't think it matters with these requirements. But
> in one of the (commercial) search engine APIs I've used the hits were
> also objects (so you had to first get the hit from the search and then
> the property from the hit). The benefit from this approach is that the
> hit object can have a direct pointer to the query that caused it
> (because a search could be constructed from more than one query). And
> some quite complicated things related to highlighting. Imagine you
> extract and index the text from a word document, then you want to view
> it as a highlighted PDF-document. For this to work each hit needs some
> extra data (I wont go into detail). But these features will never be a
> part of this API so the naming doesn't matter as much I guess. But that
> was my reason for choosing the name.

A language binding could easily map the search handle with "underlying"
query xml. That way a language binding could provide a GetQuery() method on
the Search object.

> signal SearchHitsAdded ( s search , i count)
> > >
> >
> >
> > > signal SearchHitsRemoved ( s search , ai offsets )
> >
> >
> >
> > signal HitsHitsModified ( s search , ai offsets )
> >
> >
> > Is this why you want to be able to refetch pages in GetHitProperties?
> > If I recall correct this signal is why I included the GetMetadata
> > method in the first place.
>
> Well, sort of. I think we need the functionality of what you called
> GetMetadata. The question is it all should be done by GetHitProperties,
> or if it's better to keep GetHitProperties simple and have an
> additional function as well.
>
>
> > How do you cater for snippets? If you again want to use the
> > GetHitProperties method I can see the solution, but I must say that
> > it appears inelegant to use GetHitPropeties like this - for results,
> > updates, and snippets.
>
> Using GetHitProperties was what I intended, yes. To me it appears
> elegant, but might very well just be me. I'm willing to consider other
> ideas.
>

Well, I think the current proposal is more or less in the middle of our
original different ideas...

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070125/b45a3e41/attachment.htm