2006/11/30, Magnus Bergman <<a href="mailto:magnus.bergman@observer.net">magnus.bergman@observer.net</a>>:<div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> On Sun, 19 Nov 2006 12:19:45 +0100 "Jos van den Oever" <<a href="mailto:jvdoever@gmail.com">jvdoever@gmail.com</a>> wrote: > Hi Mikkel, > > Yes, the common dbus api is still something we need. I wanted to start > on the metadata standarization first, but we can do the searching api > in parallel. You make a good start in listing the available engines. > There might even be more. To coordinate we need a process that lists > the available search engines over dbus. An application should be able > to say: I want to search using a particular interface with the > available search engines. > > The attached archive contains an effort to do two things: > - propose a very simple, common api for search engines > - implement such a coordinating daemon >   The code contains the daemon, a demo search application and a python > client to access it by finding the search engine over the > searchmanager. After reading everything in this thread and considered all concerns mentioned. I thinks it's time I come up with something concrete myself, but just criticizing others. From the requirements and suggestion mentioned on this list I tried to come up with a proposal with these things in mind: * The possibility for application authors to do searched very easily. * The possibility to do both synchronous and asynchronous searched   without having two different APIs. * Not ruling out the possibility to use a dbus interface directly. * Not ruling out the possibility to have a library. * Not causing the search engine to do unnecessary work (like repeating   searched if the hits need to be retrieved again) One problem that I choose to leave out (for now) is the need to be able to stream document which has a URL the applications cannot understand (document which are not files). This includes files inside other files and virtual documents that are constructed on demand. But at least I have had this in mind so it's not impossible to add it later. Disclaimer: The names of the functions are not part of the proposal, they are just chosen to illustrate what the functions do. And this proposal does not suggest a library API over a dbus interface, the exact same idea applies to both cases. (It is also agnostic the whatever query language used.) First a set of three basic functions that alone does most things and are probably sufficient for everybody who want a simple API:   session_handle = session_new()     Creates a new session and returns a new session handle. Creating a     new session might involve finding an appropriate search engine and     getting it ready (exactly what happens here is not important). This     might just be to open a dbus connection. I think it's OK if this     call is blocking(?). Applications would probably want to call it     during startup and it should not take that long *too* do whatever     needs to be done here (which of course depends on the search     engine).   search_handle = search_new(session_handle,query_string)     Starts a new search and returns a new search handle. By default     this function blocks until the search has been performed and the     number of hits is known (see below).   hits = search_get_hits(search_id,max_number_of_hits)     Fetches a number of hits from the search. Each hit is a set of     attributes for the hit (by default it might be URL, score and     perhaps something else important). It can be called several times     to retrieve more hits (much like read(2)). The hits are are sorted,     by default by their score. For slightly less simple use there are some more functions:   session_free(session_handle)     Frees all resources related to the session. This includes all     searches created from the session.   session_set_search_finished_signal(session_handle,signal_handler)     Sets a signal handler which is invoked then a search has been     finished. The signal handler gets the search handle back so     different searched can be held apart. If this is set the function     search_new() will not block.   session_set_search_progress_signal(session_handle,signal_handler)     Sets a signal handler which is invoked then there are new hits     available (hits which hasn't been retrieved with     search_get_hits()). The signal handler gets the search_handler,     maybe some approximation about percentage of the progress and maybe     the number of new hits for convenience.   session_set_property(session_handle,property_name,value)     Sets a property of the session. This might include default sort     order, maximum number of hits (mostly as a hint to the search     engine), minimum score, default set of attributes for hits in new     searches, is searches should live on (never considered finished but     continue to generate new hits if new matching documents show up) and     probably some other stuff.   value = session_get_property(session_handle,property_name)     Does the expected.   search_free(search_handle)     Frees all resources related to the session. The search handle     becomes invalid afterwards.   search_is_finished(search_handle)     Checks if the search is finished yet.   search_get_number_of_total_hits_so_far(search_handle)     Gets the total number of hits this search resulted in (minus the     ones discarded because of too low score of course). If the search     finished signal handler has been set the search might not yet be     finished and the number of hits so far is returned.   search_get_number_of_new_hits_so_far(search_handle)     Identical to the one above, but minus the number of hits already     retrieved using search_get_hits() (or skipped using     search_seek(), see bolow).   search_tell()     Tells how many hits that has been retrieved so far.   search_seek()     Moves the cursor in the search to either skip searches or go back     to read them again (much like lseek(2)). Yes, the name is bad, I     know (see disclaimer above). (Perhaps search_tell() and     search_seek() can be replaced by a property.)   search_set_property(search_handle,property_name,value)     Sets a property of the search. This might include sort order (for     remaining hits if some has already been retrieved), set of     attributes for hits and probably some other stuff.   value = search_get_property(search_handle,property_name)     Does the expected. </blockquote></div> Wow! :-) You beat me by a split second there - I just put my proposal on the wiki at <a href="http://wiki.freedesktop.org/wiki/WasabiSearchLive">http://wiki.freedesktop.org/wiki/WasabiSearchLive </a> :-) I'll review your work as soon as I can find the time. It's great to have some concrete stuff to compare and discuss. Cheers, Mikkel