2006/11/30, Magnus Bergman <<a href="mailto:magnus.bergman@observer.net">magnus.bergman@observer.net</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Sun, 19 Nov 2006 12:19:45 +0100<br>"Jos van den Oever" <<a href="mailto:jvdoever@gmail.com">jvdoever@gmail.com</a>> wrote:<br><br>> Hi Mikkel,<br>><br>> Yes, the common dbus api is still something we need. I wanted to start
<br>> on the metadata standarization first, but we can do the searching api<br>> in parallel. You make a good start in listing the available engines.<br>> There might even be more. To coordinate we need a process that lists
<br>> the available search engines over dbus. An application should be able<br>> to say: I want to search using a particular interface with the<br>> available search engines.<br>><br>> The attached archive contains an effort to do two things:
<br>> - propose a very simple, common api for search engines<br>> - implement such a coordinating daemon<br>> The code contains the daemon, a demo search application and a python<br>> client to access it by finding the search engine over the
<br>> searchmanager.<br><br>After reading everything in this thread and considered all concerns<br>mentioned. I thinks it's time I come up with something concrete myself,<br>but just criticizing others. From the requirements and suggestion
<br>mentioned on this list I tried to come up with a proposal with these<br>things in mind:<br>* The possibility for application authors to do searched very easily.<br>* The possibility to do both synchronous and asynchronous searched
<br> without having two different APIs.<br>* Not ruling out the possibility to use a dbus interface directly.<br>* Not ruling out the possibility to have a library.<br>* Not causing the search engine to do unnecessary work (like repeating
<br> searched if the hits need to be retrieved again)<br><br>One problem that I choose to leave out (for now) is the need to be able<br>to stream document which has a URL the applications cannot understand<br>(document which are not files). This includes files inside other files
<br>and virtual documents that are constructed on demand. But at least I<br>have had this in mind so it's not impossible to add it later.<br><br>Disclaimer: The names of the functions are not part of the proposal,<br>they are just chosen to illustrate what the functions do. And this
<br>proposal does not suggest a library API over a dbus interface, the<br>exact same idea applies to both cases. (It is also agnostic the whatever<br>query language used.)<br><br>First a set of three basic functions that alone does most things and
<br>are probably sufficient for everybody who want a simple API:<br><br> session_handle = session_new()<br><br> Creates a new session and returns a new session handle. Creating a<br> new session might involve finding an appropriate search engine and
<br> getting it ready (exactly what happens here is not important). This<br> might just be to open a dbus connection. I think it's OK if this<br> call is blocking(?). Applications would probably want to call it<br>
during startup and it should not take that long *too* do whatever<br> needs to be done here (which of course depends on the search<br> engine).<br><br> search_handle = search_new(session_handle,query_string)<br>
<br> Starts a new search and returns a new search handle. By default<br> this function blocks until the search has been performed and the<br> number of hits is known (see below).<br><br> hits = search_get_hits(search_id,max_number_of_hits)
<br><br> Fetches a number of hits from the search. Each hit is a set of<br> attributes for the hit (by default it might be URL, score and<br> perhaps something else important). It can be called several times<br> to retrieve more hits (much like read(2)). The hits are are sorted,
<br> by default by their score.<br><br>For slightly less simple use there are some more functions:<br><br> session_free(session_handle)<br><br> Frees all resources related to the session. This includes all<br> searches created from the session.
<br><br> session_set_search_finished_signal(session_handle,signal_handler)<br><br> Sets a signal handler which is invoked then a search has been<br> finished. The signal handler gets the search handle back so<br> different searched can be held apart. If this is set the function
<br> search_new() will not block.<br><br> session_set_search_progress_signal(session_handle,signal_handler)<br><br> Sets a signal handler which is invoked then there are new hits<br> available (hits which hasn't been retrieved with
<br> search_get_hits()). The signal handler gets the search_handler,<br> maybe some approximation about percentage of the progress and maybe<br> the number of new hits for convenience.<br><br> session_set_property(session_handle,property_name,value)
<br><br> Sets a property of the session. This might include default sort<br> order, maximum number of hits (mostly as a hint to the search<br> engine), minimum score, default set of attributes for hits in new<br>
searches, is searches should live on (never considered finished but<br> continue to generate new hits if new matching documents show up) and<br> probably some other stuff.<br><br> value = session_get_property(session_handle,property_name)
<br><br> Does the expected.<br><br> search_free(search_handle)<br><br> Frees all resources related to the session. The search handle<br> becomes invalid afterwards.<br><br> search_is_finished(search_handle)<br>
<br> Checks if the search is finished yet.<br><br> search_get_number_of_total_hits_so_far(search_handle)<br><br> Gets the total number of hits this search resulted in (minus the<br> ones discarded because of too low score of course). If the search
<br> finished signal handler has been set the search might not yet be<br> finished and the number of hits so far is returned.<br><br> search_get_number_of_new_hits_so_far(search_handle)<br><br> Identical to the one above, but minus the number of hits already
<br> retrieved using search_get_hits() (or skipped using<br> search_seek(), see bolow).<br><br> search_tell()<br><br> Tells how many hits that has been retrieved so far.<br><br> search_seek()<br><br> Moves the cursor in the search to either skip searches or go back
<br> to read them again (much like lseek(2)). Yes, the name is bad, I<br> know (see disclaimer above). (Perhaps search_tell() and<br> search_seek() can be replaced by a property.)<br><br> search_set_property(search_handle,property_name,value)
<br><br> Sets a property of the search. This might include sort order (for<br> remaining hits if some has already been retrieved), set of<br> attributes for hits and probably some other stuff.<br><br> value = search_get_property(search_handle,property_name)
<br><br> Does the expected.<br></blockquote></div><br>Wow! :-) You beat me by a split second there - I just put my proposal on the wiki at <a href="http://wiki.freedesktop.org/wiki/WasabiSearchLive">http://wiki.freedesktop.org/wiki/WasabiSearchLive
</a> :-)<br><br>I'll review your work as soon as I can find the time. It's great to have some concrete stuff to compare and discuss.<br><br>Cheers,<br>Mikkel<br>