Sorry for the late reply I've been totally bugged out on diseases and work! Here goes :-)<br><br>Because of the long nature of this mail I summarize some important questions in the bottom of the mail...<br><br>2006/11/30, Magnus Bergman <
<a href="mailto:magnus.bergman@observer.net">magnus.bergman@observer.net</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>After reading everything in this thread and considered all concerns<br>mentioned. I thinks it's time I come up with something concrete myself,<br>but just criticizing others. From the requirements and suggestion<br>mentioned on this list I tried to come up with a proposal with these
<br>things in mind:<br>* The possibility for application authors to do searched very easily.<br>* The possibility to do both synchronous and asynchronous searched<br> without having two different APIs.<br>* Not ruling out the possibility to use a dbus interface directly.
<br>* Not ruling out the possibility to have a library.<br>* Not causing the search engine to do unnecessary work (like repeating<br> searched if the hits need to be retrieved again)<br><br>One problem that I choose to leave out (for now) is the need to be able
<br>to stream document which has a URL the applications cannot understand<br>(document which are not files). This includes files inside other files<br>and virtual documents that are constructed on demand. But at least I<br>
have had this in mind so it's not impossible to add it later.</blockquote><div><br><br>Well, I don't think this belongs in search a api as such. This functionality sounds more like a metadata storage to me... Which is planned for standardization later - so, yeah, let's keep the options open, but punt the issue for now.
<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Disclaimer: The names of the functions are not part of the proposal,<br>they are just chosen to illustrate what the functions do. And this
<br>proposal does not suggest a library API over a dbus interface, the<br>exact same idea applies to both cases. (It is also agnostic the whatever<br>query language used.)<br><br>First a set of three basic functions that alone does most things and
<br>are probably sufficient for everybody who want a simple API:<br><br> session_handle = session_new()<br><br> Creates a new session and returns a new session handle. Creating a<br> new session might involve finding an appropriate search engine and
<br> getting it ready (exactly what happens here is not important). This<br> might just be to open a dbus connection. I think it's OK if this<br> call is blocking(?). Applications would probably want to call it<br>
during startup and it should not take that long *too* do whatever<br> needs to be done here (which of course depends on the search<br> engine).<br><br> search_handle = search_new(session_handle,query_string)<br>
<br> Starts a new search and returns a new search handle. By default<br> this function blocks until the search has been performed and the<br> number of hits is known (see below).<br><br> hits = search_get_hits(search_id,max_number_of_hits)
<br><br> Fetches a number of hits from the search. Each hit is a set of<br> attributes for the hit (by default it might be URL, score and<br> perhaps something else important). It can be called several times<br> to retrieve more hits (much like read(2)). The hits are are sorted,
<br> by default by their score.</blockquote><div><br><br>Ok, more minimalistic than the current simple interface, but I guess it could work.<br><br>When I compared this interface to the current live one proposed at <a href="http://wiki.freedesktop.org/wiki/WasabiSearchLive">
http://wiki.freedesktop.org/wiki/WasabiSearchLive</a>, my first thought was that you session object was equivalent to the dbus connection made by the application. In my proposal the app then uses the connection/session to obtain a Query object with the NewQuery() method.
<br><br>From an application developers point of view this might be correct. I just forgot to look at this through my search engine developers glasses :-) From the search engines perspective it might actually be nice to have a parent session for each query. This is actually something we have to ask the search engine developers about. See the bottom of this mail.
<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">For slightly less simple use there are some more functions:<br><br> session_free(session_handle)
<br></blockquote><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Frees all resources related to the session. This includes all<br> searches created from the session.
</blockquote><div><br><div><br>
I think this should be available in a simple api if we use Session objects. - I do realise that you only want one api though :-)
<br></div> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> session_set_search_finished_signal(session_handle,signal_handler)<br>
<br> Sets a signal handler which is invoked then a search has been<br> finished. The signal handler gets the search handle back so<br> different searched can be held apart. If this is set the function<br> search_new() will not block.
</blockquote><div><br><br>It seems simpler to me that the applications simple ask "are you done?" each time it receives a new batch of hits. A bit more dbus traffic, but not much.<br>Thus having a search_is_finished(search_handle) instead (which you actually define below).
<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> session_set_search_progress_signal(session_handle,signal_handler)<br><br> Sets a signal handler which is invoked then there are new hits
<br> available (hits which hasn't been retrieved with<br> search_get_hits()). The signal handler gets the search_handler,<br> maybe some approximation about percentage of the progress and maybe<br> the number of new hits for convenience.
</blockquote><div><br><br>Why not return the hits with the signal? I see something cool in not returning the results until the application specifically requests them though. It reminds somewhat of the way spotlight does it, and it is also closer to what libbeagle does. This is an important point - I added it to the bottom of this mail.
<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> session_set_property(session_handle,property_name,value)<br><br> Sets a property of the session. This might include default sort
<br> order, maximum number of hits (mostly as a hint to the search<br> engine), minimum score, default set of attributes for hits in new<br> searches, is searches should live on (never considered finished but<br>
continue to generate new hits if new matching documents show up) and<br> probably some other stuff.</blockquote><div><br><br>The properties you mention sounds more like properties of the query of you ask me...<br>
</div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> value = session_get_property(session_handle,property_name)<br><br> Does the expected.
<br><br> search_free(search_handle)<br><br> Frees all resources related to the session. The search handle<br> becomes invalid afterwards.<br><br> search_is_finished(search_handle)<br><br> Checks if the search is finished yet.
</blockquote><div><br><br>Check, check, and check on those methods.<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> search_get_number_of_total_hits_so_far(search_handle)
<br><br> Gets the total number of hits this search resulted in (minus the<br> ones discarded because of too low score of course). If the search<br> finished signal handler has been set the search might not yet be
<br> finished and the number of hits so far is returned.<br><br> search_get_number_of_new_hits_so_far(search_handle)<br><br> Identical to the one above, but minus the number of hits already<br> retrieved using search_get_hits() (or skipped using
<br> search_seek(), see bolow).<br><br> search_tell()<br><br> Tells how many hits that has been retrieved so far.</blockquote><div><br><br>The above three methods doesn't feel right... There seems to be some book keeping that could be done on the client side just as well.
<br></div><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> search_seek()<br><br> Moves the cursor in the search to either skip searches or go back
<br> to read them again (much like lseek(2)). Yes, the name is bad, I<br> know (see disclaimer above). (Perhaps search_tell() and<br> search_seek() can be replaced by a property.)</blockquote><div><br><br>Is this method actually useful? I think it needs real good justification since it will introduce quite some work on the search engine side to support (correct me if I'm wrong).
<br><br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> search_set_property(search_handle,property_name,value)<br><br> Sets a property of the search. This might include sort order (for
<br> remaining hits if some has already been retrieved), set of<br> attributes for hits and probably some other stuff.<br><br> value = search_get_property(search_handle,property_name)<br><br> Does the exget_querypected.
<br></blockquote></div><br><br>Question 1 : Will it benefit the search engine to have a Session object for each connection? Then Query objects are spawned by a call like Magnus suggest; Query = NewQuery(Session, query_string)? Is it correct that applications doesn't need to care about sessions - just gimme the goddam query! ? :-)
<br><br>Question 2 : Should the results be returned with the HitsAdded signal? The Query object then has a Query.GetResults method to retrieve the results. This is closer to libbeagle and spotlight and the application only spends time retrieving hits when it really wants to. It does introduce some extra method calls though...
<br><br>In the <a href="http://wiki.freedesktop.org/wiki/WasabiSearchLive">http://wiki.freedesktop.org/wiki/WasabiSearchLive</a> proposal the session and the query object is somewhat merged (since you can change a running query (restarting it)). I personally think it is rather elegant, but perhaps it is really just a mess.
<br><br>Cheers, let's get this ball rolling again. For the end users!<br>Mikkel<br>