2006/11/23, Fabrice Colin <<a href="mailto:fabrice.colin@gmail.com">fabrice.colin@gmail.com</a>>:<div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Hello all, On 11/23/06, Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>> wrote: > 2006/11/22, Magnus Bergman <<a href="mailto:magnus.bergman@observer.net"> magnus.bergman@observer.net</a>>: > > If several search engines are available, the search manager lets the > > client know of each search engine according to your proposal (right?). > > I think it would be a better idea to present a list of indexes (of which > > each search engine might provide several) to search in, but by default > > search in all of them (if appropriate). I > > Well, the search engines are not obliged to use a particular index format. > The indexes them selves can be of any format. > What Magnus suggests may be useful for document 'sources' or 'groups' (for lack of a better name), eg "Documents", "Applications", "Contacts", "Conversations" etc... -as offered by some existing personal search systems- which may or may not map to individual indexes (that mapping being irrelevant).</blockquote><div> That was exactly what I meant to cover with the "group" switch. Fx. the query "fabrice group:contacts" would return you. Searching without a specified group would return matches from all groups. Perhaps the wiki is a bit unclear here... </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> > In addition to this session object I have found it suitable to also > > have a search object (created from a query) because applications might > > construct very complicated queries. This object can then is passed > > to countHits, and used for getting the hits. And also for getting > > attributes of the hit (matching document, score, language and such). > > (Note that a hit is not equivalent to a document.) > > The problem with creating query objects like this, is that we are creating a > dbus api. Essentially you only have simple data types at your hand. No > objects - especially objects with methods on them :-) It would be possible > to create a helper lib in <insert favorite language + toolkit> to construct > queries conforming to the wasabi spec, but this would require separate libs > for gobject and qt. While this is by no means ruled out, I think we better > focus on the "bare" dbus api for now. ></blockquote><div>  </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Well, AFAIK, dbus allows complex structures like arrays or dictionaries. </blockquote><div> Yeah, but that really only accounts as collections of simple data types in my book. What I meant was just that you can't have Query object, like fx Lucene does, and pass that over the wire. Not in a desktop neutral way at least - or please correct me if I'm wrong! :-) </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> The situation at hand is that we have a  handful of desktop search engines, > all implemented as daemons, both handling searches and indexing. Having an > extra daemon on top of that handling the query one extra time before passing > it to the search subsystem seems overkill... Ideally I see the daemon/lib > (or even executable) to only be used as a means of obtaining a dbus object > path given a dbus interface name (" org.freedesktop.search.simple"). > Agreed. The daemon's role would probably also include filtering out search services based on user preferences, wouldn't it ?</blockquote><div>  </div>Yeah, that was my idea atleast. Perform a selection  based on some sane criterias (read: user configuration). My idea was that the api consumer only <div>needed to call getInterfaceProvider("org.freedesktop.search.simple") and then get one object path back to use for the dbus connection. </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> > > One thing that English users seldom consider is the usages of several > > languages. Which language is being used is important to know in order > > to decide what stemming rules to use, and which stop-words use (in > > English "the" is a stop-word while it in Swedish means tea and is > > something that is adequate to search for). People using other languages > > are very often multi lingual (using English as well). Therefore it is > > interesting to know which language the query is in (search engines > > might also be able to translate queries to search in document written > > in different languages). > > > > This is a good point. However I suggest leaving this up to the actual > implementations. After all it is an indexing time question what stemmer to > use when indexing a document... > The language is also useful at query time for the query to be parsed & tokenized in a way that's consistent with how documents text was at indexing time. For instance, if the query is in English -as Magnus points out- you may want to remove English stopwords, run an English stemmer on terms, or even limit the search to documents that were detected as being in English at indexing time.</blockquote><div> Right you are. I was a bit wasted last night when I  answered Magnus (sorry) - I just thought her deserved an answer sooner rather than later. The question is then if this info should be stored in  the manager daemon or the search engine. As I consider it more or less a design goal that the daemon (or lib or what ever we end up with), should be expendable, I don't think such info should lie with the managing object. Also if this info would reside with the managing object that would also mean each query should go through the managing interface, and I don't think I'm totally hooked on that idea. To avoid code duplication we could develop a small lib or other dbus service to *optionally* handle these issues. I'm reluctant to impose any dependency on the implementing engines. </div> Cheers, Mikkel </div>