I updated the live search proposal on <a href="http://wiki.freedesktop.org/wiki/WasabiSearchLive">http://wiki.freedesktop.org/wiki/WasabiSearchLive</a> with a unified one (of simple and live).<br><br>2007/1/24, Magnus Bergman <
<a href="mailto:magnus.bergman@observer.net">magnus.bergman@observer.net</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Sat, 20 Jan 2007 21:27:38 +0100<br>"Mikkel Kamstrup Erlandsen" <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>> wrote:<br><br>> 2007/1/19, Magnus Bergman <<a href="mailto:magnus.bergman@observer.net">
magnus.bergman@observer.net</a>>:<br>> ><br>> > First some comments on the current draft[1]<br>> > """""""""""""""""""""""""""""""""""""""""""
<br>> ><br>> > As with the WasabiSearchSimple API[2] the session *is* the D-BUS<br>> > connection. So there really doesn't need to be an explicit session<br>> > object. It might be adequate to have one for the language
<br>> > bindings, but then the same thing goes for the simple API.<br>><br>> I actually think the session should be explicit. Both language<br>> bindings and actual server implementations would have an easier life
<br>> if it was explicit.<br><br>I don't object to that. But in that case I think the same goes for the<br>simple API. I assume sessions will map 1:1 to the dbus connection<br>(bindings might want to hide the dbus connection in the session object).
</blockquote><div><br>Ok, good. <br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> If the method GetMetadata should exist I think it would make more
<br>> > sense to make it belong to a document object, rename it<br>> > GetProperty and include it in the metadata storage API instead.<br>><br>><br>> Yes, it looks out of place in the search interface. There does
<br>> however need to be a way to obtain the "expensive" hit metadata as<br>> discussed in the thread about the simple api.<br>><br>> > And as I said before, I think it makes sense to treat queries and
<br>> > searches as different objects, which means renaming Query.Start to<br>> > something like NewSearch. It also means that a query doesn't need<br>> > to belong to anything (like the session), it could exist
<br>> > independently (unlike a search). I have left out possible functions<br>> > dealing with queries (like constructing an XML query from a simple<br>> > query string) since functions like that rather belong in a library.
<br>><br>><br>> I follow you on the search/query separation. Having NewSearch()<br>> actually start the search gives some problems with the<br>> SearchSetProperty() since it doesn't make much sense to change
<br>> properties on a running search. Spotlight has some similar methods<br>> and they restart the search if you invoke them. The reason I included<br>> a Query.Start - in current context Search.Start, was exactly that it
<br>> should be possible to set properties on a Search/Query before it was<br>> actually run.<br><br>If it doesn't make sense to change properties on a running search, then<br>the function could be removed. But I think there might be cases then it
<br>does. Every property set before the search starts are just included in<br>the XML query, right? So any function that sets properties for the<br>query can never do anything else than modify the query on the client<br>side. And I think such functions belong in a library.
</blockquote><div><br>I removed the method from the search object. Session properties are not included in the query xml, but are set on the server separately.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> Apart from ShowConfiguration(), all functions of the simple API<br>> seems<br>> > to be in the live API as well.<br>><br>><br>> I moved simple/live.ShowCOnfiguration to a dbus interface<br>>
org.freedesktop.search.ui.ShowConfiguration, togeteher with a new<br>> method ShowSearchTool. Please see<br>> <a href="http://wiki.freedesktop.org/wiki/WasabiUI">http://wiki.freedesktop.org/wiki/WasabiUI</a> for the api spec proposal.
<br>> Sorry I did not find time to notify the list before now - spare my<br>> life :-)<br>><br>> ... So, would it be<br>> > possible and desirable to define the simple API as a subset of the<br>> > live API?
<br>><br>><br>> I have ambivalent feelings on this issue. Let me outline pros and<br>> cons as I see them. I shall spare you my confusing thoughts and cut<br>> to the cheese:<br>><br>> Loose Idea for an Interface Merge:
<br>> Have a boolean session property called "block". If it is true,<br>> GetHits() and CountHits() blocks until the desired info is available,<br>> removing the need for signals. If there are less hits than requested
<br>> in by GetHits when the entire index have been searched, just return<br>> the found items.<br><br>Yes. In addition to the block property it might make sense to have a<br>"live" property as well (meaning the search will never finish). Just
<br>because you don't want the live feature doesn't necessarily mean you<br>want it to block.</blockquote><div><br><br>Yes that makes sense. I included it in the updated suggestion. <br></div><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> The simples use case, retrieving uri and dc:title, would then look<br>> something like this (in pseudocode):<br>><br>> session = NewSession()<br>> SetProperty (session, "block", "true")
<br>> SetProperty (session, "properties", "uri ; dc:title")<br>><br>> search = NewSearch (query_xml, session) <-- search obj inherits<br>> requested props from the session<br>> hits = GetHits (search, 1000)
<br>> <show hits><br>><br>> count = HitCount (search)<br>> <print: showing 1000 of *count* hits><br>> Close(search)<br>> Close(session)<br><br>Yes, that's pretty close to what I imagined too. In addition I think
<br>"block" should be true by default (to make simple searching even<br>simpler). But what does "search obj inherits" mean?</blockquote><div><br><br>Agree on the "block" thing. I meant it as a reference to the (now removed)
Search.Set/GetProperty method. When you create a new search object all properties from the session are "inherited".<br></div><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
><br>> The actual proposal<br>> > """""""""""""""""""<br>> ><br>> > SetProperty ( in s property , in s value )
<br>> ><br>> > Set a global (session) property. This method can be used for<br>> > several things.<br>> > o Setting default properties for Query objects.<br>> > o Authentication/encryption
<br>> > o Generally be flexible for future needs<br>> > * property: Name of the property.<br>> > * value: New value for the property.<br>> ><br>> > GetProperty ( in s property , out s value)
<br>> ><br>> > Get the value of a global (session) property.<br>> > * property: Name of the property.<br>> > * value: Current value of the property.<br>><br>><br>> As noted above I still think we need a session handle. By using
<br>> handles we could even Get/SetProperty to take both a session- or a<br>> search handle. Like SetProperty(handle, prop, val).<br><br>A common SetProperty function requires some magic, which might make it<br>troublesome for some languages. It might be neat to have in some
<br>languages (using overloading) but I object to having it at this level.</blockquote><div><br><br>Agreed. Let's just have properties on the session only. Unless someone comes up with a real good example where something makes sense on the search only.
<br></div><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">><br>> NewSearchFromXML ( in s query_xml , out s search )<br>> >
<br>> > Start a new search from an XML query.<br>> > * query_xml: The query to execute.<br>> > * search: A handle that is used to uniquely identify this<br>> > search.<br>><br>><br>
> If the searches/queries can have properties I think we need a<br>> intermediate StartSearch() method. I can accept that if we decide to<br>> only have session properties then to start the search right away.<br>
<br>I don't really understand the need. This *is* the "StartSearch" method.<br>Every property set before the search starts is included in the query<br>(XML string). Or am I missing something?</blockquote><div>
<br><br>The updated proposal uses Search() to both create and start the search.<br></div><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
><br>> SearchClose ( in s search)<br>><br>><br>> Check.<br>><br>><br>><br>> SearchSetProperty ( in s search , in s property , in s value)<br>> ><br>> > SearchGetProperty ( in s search , in s property , out s value)
<br>><br>><br>> I have a few remarks related to this above.<br>><br>><br>> SearchCountHits ( in s search , out i count )<br>><br>><br>> Check<br>><br>> SearchGetHitProperties ( in s search, in i offset, in i limit,
<br>> > in as properties, out a{sa{sas}} response )<br>><br>><br>> I think it should be called GetHits. Why list requested props here<br>> if you also do it in the Set*Property()? Why do we need an offset? In
<br>> a live search I can't see any reason to re-request a given range of<br>> hits. Didn't we agree that the return value should be without maps<br>> and just arrays?<br><br>My idea of listing the requested props in Set*Property() was more of
<br>limiting the set of properties that could be retrieved with this<br>function (but defaults to every possible prop), including the<br>expensive one(s). The typical case would be to call this function once<br>to get the basic props, and then perhaps again to get other (expensive)
<br>ones. In order to be able to request expensive properties later, there<br>has to be a function like this in one way or another, even if it has<br>another name than this function.<br><br>Instead of using an offset there could be a function for "seeking" in
<br>the search result, since you might want to go back and read some<br>other properties. I don't have any strong feelings about this, but I<br>think it's slightly easier (for the API user) to have an offset like<br>
this.<br><br>I think it should be possible to re-request hits, since you actually<br>get it for free. The server has to remember them anyway, otherwise it<br>will be unable to tell then a document no longer matches the query,
<br>right?</blockquote><div><br>You can easily re-request hits with the updated proposal. Just GetHitData() with hit ids and wanted props.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
The real reason why I left the maps instead of writing it as arrays is<br>that I don't know the syntax, I'm perfectly happy with arrays.<br><br>About the name, I don't think it matters with these requirements. But
<br>in one of the (commercial) search engine APIs I've used the hits were<br>also objects (so you had to first get the hit from the search and then<br>the property from the hit). The benefit from this approach is that the
<br>hit object can have a direct pointer to the query that caused it<br>(because a search could be constructed from more than one query). And<br>some quite complicated things related to highlighting. Imagine you<br>extract and index the text from a word document, then you want to view
<br>it as a highlighted PDF-document. For this to work each hit needs some<br>extra data (I wont go into detail). But these features will never be a<br>part of this API so the naming doesn't matter as much I guess. But that
<br>was my reason for choosing the name.</blockquote><div><br>A language binding could easily map the search handle with "underlying" query xml. That way a language binding could provide a GetQuery() method on the Search object.
<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> signal SearchHitsAdded ( s search , i count)<br>> ><br>><br>>
<br>> > signal SearchHitsRemoved ( s search , ai offsets )<br>><br>><br>><br>> signal HitsHitsModified ( s search , ai offsets )<br>><br>><br>> Is this why you want to be able to refetch pages in GetHitProperties?
<br>> If I recall correct this signal is why I included the GetMetadata<br>> method in the first place.<br><br>Well, sort of. I think we need the functionality of what you called<br>GetMetadata. The question is it all should be done by GetHitProperties,
<br>or if it's better to keep GetHitProperties simple and have an<br>additional function as well.<br><br><br>> How do you cater for snippets? If you again want to use the<br>> GetHitProperties method I can see the solution, but I must say that
<br>> it appears inelegant to use GetHitPropeties like this - for results,<br>> updates, and snippets.<br><br>Using GetHitProperties was what I intended, yes. To me it appears<br>elegant, but might very well just be me. I'm willing to consider other
<br>ideas.<br></blockquote></div><br>Well, I think the current proposal is more or less in the middle of our original different ideas...<br><br>Cheers,<br>Mikkel<br>