simple search api (was Re: mimetype standardisation by testsets)
Mikkel Kamstrup Erlandsen
mikkel.kamstrup at gmail.com
Thu Nov 30 18:34:57 EET 2006
2006/11/30, Magnus Bergman <magnus.bergman at observer.net>:
> On Sun, 19 Nov 2006 12:19:45 +0100
> "Jos van den Oever" <jvdoever at gmail.com> wrote:
> > Hi Mikkel,
> > Yes, the common dbus api is still something we need. I wanted to start
> > on the metadata standarization first, but we can do the searching api
> > in parallel. You make a good start in listing the available engines.
> > There might even be more. To coordinate we need a process that lists
> > the available search engines over dbus. An application should be able
> > to say: I want to search using a particular interface with the
> > available search engines.
> > The attached archive contains an effort to do two things:
> > - propose a very simple, common api for search engines
> > - implement such a coordinating daemon
> > The code contains the daemon, a demo search application and a python
> > client to access it by finding the search engine over the
> > searchmanager.
> After reading everything in this thread and considered all concerns
> mentioned. I thinks it's time I come up with something concrete myself,
> but just criticizing others. From the requirements and suggestion
> mentioned on this list I tried to come up with a proposal with these
> things in mind:
> * The possibility for application authors to do searched very easily.
> * The possibility to do both synchronous and asynchronous searched
> without having two different APIs.
> * Not ruling out the possibility to use a dbus interface directly.
> * Not ruling out the possibility to have a library.
> * Not causing the search engine to do unnecessary work (like repeating
> searched if the hits need to be retrieved again)
> One problem that I choose to leave out (for now) is the need to be able
> to stream document which has a URL the applications cannot understand
> (document which are not files). This includes files inside other files
> and virtual documents that are constructed on demand. But at least I
> have had this in mind so it's not impossible to add it later.
> Disclaimer: The names of the functions are not part of the proposal,
> they are just chosen to illustrate what the functions do. And this
> proposal does not suggest a library API over a dbus interface, the
> exact same idea applies to both cases. (It is also agnostic the whatever
> query language used.)
> First a set of three basic functions that alone does most things and
> are probably sufficient for everybody who want a simple API:
> session_handle = session_new()
> Creates a new session and returns a new session handle. Creating a
> new session might involve finding an appropriate search engine and
> getting it ready (exactly what happens here is not important). This
> might just be to open a dbus connection. I think it's OK if this
> call is blocking(?). Applications would probably want to call it
> during startup and it should not take that long *too* do whatever
> needs to be done here (which of course depends on the search
> search_handle = search_new(session_handle,query_string)
> Starts a new search and returns a new search handle. By default
> this function blocks until the search has been performed and the
> number of hits is known (see below).
> hits = search_get_hits(search_id,max_number_of_hits)
> Fetches a number of hits from the search. Each hit is a set of
> attributes for the hit (by default it might be URL, score and
> perhaps something else important). It can be called several times
> to retrieve more hits (much like read(2)). The hits are are sorted,
> by default by their score.
> For slightly less simple use there are some more functions:
> Frees all resources related to the session. This includes all
> searches created from the session.
> Sets a signal handler which is invoked then a search has been
> finished. The signal handler gets the search handle back so
> different searched can be held apart. If this is set the function
> search_new() will not block.
> Sets a signal handler which is invoked then there are new hits
> available (hits which hasn't been retrieved with
> search_get_hits()). The signal handler gets the search_handler,
> maybe some approximation about percentage of the progress and maybe
> the number of new hits for convenience.
> Sets a property of the session. This might include default sort
> order, maximum number of hits (mostly as a hint to the search
> engine), minimum score, default set of attributes for hits in new
> searches, is searches should live on (never considered finished but
> continue to generate new hits if new matching documents show up) and
> probably some other stuff.
> value = session_get_property(session_handle,property_name)
> Does the expected.
> Frees all resources related to the session. The search handle
> becomes invalid afterwards.
> Checks if the search is finished yet.
> Gets the total number of hits this search resulted in (minus the
> ones discarded because of too low score of course). If the search
> finished signal handler has been set the search might not yet be
> finished and the number of hits so far is returned.
> Identical to the one above, but minus the number of hits already
> retrieved using search_get_hits() (or skipped using
> search_seek(), see bolow).
> Tells how many hits that has been retrieved so far.
> Moves the cursor in the search to either skip searches or go back
> to read them again (much like lseek(2)). Yes, the name is bad, I
> know (see disclaimer above). (Perhaps search_tell() and
> search_seek() can be replaced by a property.)
> Sets a property of the search. This might include sort order (for
> remaining hits if some has already been retrieved), set of
> attributes for hits and probably some other stuff.
> value = search_get_property(search_handle,property_name)
> Does the expected.
Wow! :-) You beat me by a split second there - I just put my proposal on the
wiki at http://wiki.freedesktop.org/wiki/WasabiSearchLive :-)
I'll review your work as soon as I can find the time. It's great to have
some concrete stuff to compare and discuss.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the xdg