2007/1/12, Jean-Francois Dockes <<a href="mailto:jean-francois.dockes@wanadoo.fr">jean-francois.dockes@wanadoo.fr</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>Just a few opinions/comments/votes on recent issues:<br><br>- Need for a query-closing call and backend resource management issues: It<br> is up to the backend to manage its resources, and decide how processing<br> should be split between Query() and GetHitProperties().
</blockquote><div> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> To make things easier, I am in favour of a CloseQuery() call which
<br> well-behaved applications will use, and also of specifying that<br> query_handles can become stale, and that applications should then restart<br> the query (which opens the question of error reports which is still a
<br> blank area).</blockquote><div><br><br>
Check. It seems people agree with you on this. I'll update the wiki. <br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">- CountHits() / GetHitproperties() racy-ness: It is up to the backend to
<br> maintain consistency inside a single opened query, the current interface<br> allows it (unlike the previous one using the query string as a bad<br> query_handle).<br><br> Ideally the Query() call would open some kind of database snapshot which
<br> would be preserved as long as the query_handle is valid. This may be<br> feasible or not with the current backends, which are expected to just "do<br> their best", which the current draft does not prevent. Aren't things such
<br> as CountHits() usually considered to only return estimates anyway ?</blockquote><div><br>Well. It could be noted in the wiki that CountHits is not guaranteed to return the correct number (especially on large result sets).
<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">- GetHitProperties result list as map or sequence: as Fabrice wrote, the<br> object identifiers are not useful. The results are requested as slices
<br> from of an ordered list (offset/limit), and should be returned as a<br> simple sequence or array of (propertyName=>propertyValue) maps.<br><br> Magnus' initially proposed the response to be:<br> "A map mapping each hit (sequence number) to a map of property-list of
<br> values pairs."<br><br> I think that the sequence number can be kept implicit:<br><br> Query (in s query_string, out i query_handle)<br> GetHitProperties ( in s query_handle, in i offset, in i limit,<br>
in as properties, out (sequence of maps) response )</blockquote><div><br><br>The return value could be stripped of all maps and use the same ordering of properties as in the properties input value. Fx the call:
<br><br> GetHitProperties (query_handle,0, 2, ["uri", "dc:title", "mime"])<br><br>could return:<br><br>[<br> ["<a href="file:///home/mikkel/delta_comp.pdf">file:///home/mikkel/delta_comp.pdf
</a>", "Delta Complexes", "application/pdf"]<br> ["<a href="file:///home/mikkel/summa.svg">file:///home/mikkel/summa.svg</a>", "Summa Logo", "image/svg+xml"]<br>]<br>
<br>From an optimization point of view this is probably the best we can get. This is also how track er currently does, and it is relatively easy to work with.<br><br>The reason why I'm hesitating to go for this solution is the live api. It would be really nice to be able to use the same data structures here. The live api however has a need to be able to tell the consumer that *this particular hit* has become invalid.
<br><br>A way around this could be to always have the first element in the response list be a unique hit identifier. Or the last element for that matter - this way the returned properties would have the same indices as the requested properties.
<br><br>We could ease up on the global-identifier thing, and just let the identifier be relative to the given query handle.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
- Using URI as key: as previously stated I think that this is a bad idea.</blockquote><div><br>+1 <br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
- Accessing Snippets individually: no need for GetSnippets(), use:<br> GetHitProperties(query_handle, offset, 1, ["Snippet"])</blockquote><div><br>As far as I can tell, this is the general consensus...<br><br>Cheers,
<br>Mikkel<br><br>PS: Be sure to check out the query language proposal at <a href="http://wiki.freedesktop.org/wiki/WasabiQueryLanguage">http://wiki.freedesktop.org/wiki/WasabiQueryLanguage</a><br></div><br></div>