simple search api (was Re: mimetype standardisation by testsets)

Thu Nov 30 18:34:57 EET 2006

2006/11/30, Magnus Bergman <magnus.bergman at observer.net>:
>
> On Sun, 19 Nov 2006 12:19:45 +0100
> "Jos van den Oever" <jvdoever at gmail.com> wrote:
>
> > Hi Mikkel,
> >
> > Yes, the common dbus api is still something we need. I wanted to start
> > on the metadata standarization first, but we can do the searching api
> > in parallel. You make a good start in listing the available engines.
> > There might even be more. To coordinate we need a process that lists
> > the available search engines over dbus. An application should be able
> > to say: I want to search using a particular interface with the
> > available search engines.
> >
> > The attached archive contains an effort to do two things:
> > - propose a very simple, common api for search engines
> > - implement such a coordinating daemon
> >   The code contains the daemon, a demo search application and a python
> > client to access it by finding the search engine over the
> > searchmanager.
>
> After reading everything in this thread and considered all concerns
> mentioned. I thinks it's time I come up with something concrete myself,
> but just criticizing others. From the requirements and suggestion
> mentioned on this list I tried to come up with a proposal with these
> things in mind:
> * The possibility for application authors to do searched very easily.
> * The possibility to do both synchronous and asynchronous searched
>   without having two different APIs.
> * Not ruling out the possibility to use a dbus interface directly.
> * Not ruling out the possibility to have a library.
> * Not causing the search engine to do unnecessary work (like repeating
>   searched if the hits need to be retrieved again)
>
> One problem that I choose to leave out (for now) is the need to be able
> to stream document which has a URL the applications cannot understand
> (document which are not files). This includes files inside other files
> and virtual documents that are constructed on demand. But at least I
> have had this in mind so it's not impossible to add it later.
>
> Disclaimer: The names of the functions are not part of the proposal,
> they are just chosen to illustrate what the functions do. And this
> proposal does not suggest a library API over a dbus interface, the
> exact same idea applies to both cases. (It is also agnostic the whatever
> query language used.)
>
> First a set of three basic functions that alone does most things and
> are probably sufficient for everybody who want a simple API:
>
>   session_handle = session_new()
>
>     Creates a new session and returns a new session handle. Creating a
>     new session might involve finding an appropriate search engine and
>     getting it ready (exactly what happens here is not important). This
>     might just be to open a dbus connection. I think it's OK if this
>     call is blocking(?). Applications would probably want to call it
>     during startup and it should not take that long *too* do whatever
>     needs to be done here (which of course depends on the search
>     engine).
>
>   search_handle = search_new(session_handle,query_string)
>
>     Starts a new search and returns a new search handle. By default
>     this function blocks until the search has been performed and the
>     number of hits is known (see below).
>
>   hits = search_get_hits(search_id,max_number_of_hits)
>
>     Fetches a number of hits from the search. Each hit is a set of
>     attributes for the hit (by default it might be URL, score and
>     perhaps something else important). It can be called several times
>     to retrieve more hits (much like read(2)). The hits are are sorted,
>     by default by their score.
>
> For slightly less simple use there are some more functions:
>
>   session_free(session_handle)
>
>     Frees all resources related to the session. This includes all
>     searches created from the session.
>
>   session_set_search_finished_signal(session_handle,signal_handler)
>
>     Sets a signal handler which is invoked then a search has been
>     finished. The signal handler gets the search handle back so
>     different searched can be held apart. If this is set the function
>     search_new() will not block.
>
>   session_set_search_progress_signal(session_handle,signal_handler)
>
>     Sets a signal handler which is invoked then there are new hits
>     available (hits which hasn't been retrieved with
>     search_get_hits()). The signal handler gets the search_handler,
>     maybe some approximation about percentage of the progress and maybe
>     the number of new hits for convenience.
>
>   session_set_property(session_handle,property_name,value)
>
>     Sets a property of the session. This might include default sort
>     order, maximum number of hits (mostly as a hint to the search
>     engine), minimum score, default set of attributes for hits in new
>     searches, is searches should live on (never considered finished but
>     continue to generate new hits if new matching documents show up) and
>     probably some other stuff.
>
>   value = session_get_property(session_handle,property_name)
>
>     Does the expected.
>
>   search_free(search_handle)
>
>     Frees all resources related to the session. The search handle
>     becomes invalid afterwards.
>
>   search_is_finished(search_handle)
>
>     Checks if the search is finished yet.
>
>   search_get_number_of_total_hits_so_far(search_handle)
>
>     Gets the total number of hits this search resulted in (minus the
>     ones discarded because of too low score of course). If the search
>     finished signal handler has been set the search might not yet be
>     finished and the number of hits so far is returned.
>
>   search_get_number_of_new_hits_so_far(search_handle)
>
>     Identical to the one above, but minus the number of hits already
>     retrieved using search_get_hits() (or skipped using
>     search_seek(), see bolow).
>
>   search_tell()
>
>     Tells how many hits that has been retrieved so far.
>
>   search_seek()
>
>     Moves the cursor in the search to either skip searches or go back
>     to read them again (much like lseek(2)). Yes, the name is bad, I
>     know (see disclaimer above). (Perhaps search_tell() and
>     search_seek() can be replaced by a property.)
>
>   search_set_property(search_handle,property_name,value)
>
>     Sets a property of the search. This might include sort order (for
>     remaining hits if some has already been retrieved), set of
>     attributes for hits and probably some other stuff.
>
>   value = search_get_property(search_handle,property_name)
>
>     Does the expected.
>

Wow! :-) You beat me by a split second there - I just put my proposal on the
wiki at http://wiki.freedesktop.org/wiki/WasabiSearchLive :-)

I'll review your work as soon as I can find the time. It's great to have
some concrete stuff to compare and discuss.

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20061130/1ae2df88/attachment.htm