Simple search API proposal, take 2

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Mon Jan 15 05:36:54 PST 2007


2007/1/12, Jean-Francois Dockes <jean-francois.dockes at wanadoo.fr>:
>
>
> Just a few opinions/comments/votes on recent issues:
>
> - Need for a query-closing call and backend resource management issues: It
>   is up to the backend to manage its resources, and decide how processing
>   should be split between Query() and GetHitProperties().



>   To make things easier, I am in favour of a CloseQuery() call which
>   well-behaved applications will use, and also of specifying that
>   query_handles can become stale, and that applications should then
> restart
>   the query (which opens the question of error reports which is still a
>   blank area).



Check.  It seems people agree with you on this. I'll update the wiki.


- CountHits() / GetHitproperties() racy-ness: It is up to the backend to
>   maintain consistency inside a single opened query, the current interface
>   allows it (unlike the previous one using the query string as a bad
>   query_handle).
>
>   Ideally the Query() call would open some kind of database snapshot which
>   would be preserved as long as the query_handle is valid. This may be
>   feasible or not with the current backends, which are expected to just
> "do
>   their best", which the current draft does not prevent. Aren't things
> such
>   as CountHits() usually considered to only return estimates anyway ?


Well. It could be noted in the wiki that CountHits is not guaranteed to
return the correct number (especially on large result sets).


- GetHitProperties result list as map or sequence: as Fabrice wrote, the
>   object identifiers are not useful. The results are requested as slices
>   from of an ordered list (offset/limit), and should be returned as a
>   simple sequence or array of (propertyName=>propertyValue) maps.
>
>   Magnus' initially proposed the response to be:
>    "A map mapping each hit (sequence number) to a map of property-list of
>      values pairs."
>
>   I think that the sequence number can be kept implicit:
>
>     Query (in s query_string, out i query_handle)
>     GetHitProperties ( in s query_handle, in i offset, in i limit,
>                        in as properties, out (sequence of maps) response )



The return value could be stripped of all maps and use the same ordering of
properties as in the properties input value. Fx the call:

  GetHitProperties (query_handle,0, 2, ["uri", "dc:title", "mime"])

could return:

[
 ["file:///home/mikkel/delta_comp.pdf", "Delta Complexes",
"application/pdf"]
 ["file:///home/mikkel/summa.svg", "Summa Logo", "image/svg+xml"]
]

>From an optimization point of view this is probably the best we can get.
This is also how track er currently does, and it is relatively easy to work
with.

The reason why I'm hesitating to go for this solution is the live api. It
would be really nice to be able to use the same data structures here. The
live api however has a need to be able to tell the consumer that *this
particular hit* has become invalid.

A way around this could be to always have the first element in the response
list be a unique hit identifier. Or the last element for that matter - this
way the returned properties would have the same indices as the requested
properties.

We could ease up on the global-identifier thing, and just let the identifier
be relative to the given query handle.

- Using URI as key: as previously stated I think that this is a bad idea.


+1

- Accessing Snippets individually: no need for GetSnippets(), use:
>   GetHitProperties(query_handle, offset, 1, ["Snippet"])


As far as I can tell, this is the general consensus...

Cheers,
Mikkel

PS: Be sure to check out the query language proposal at
http://wiki.freedesktop.org/wiki/WasabiQueryLanguage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070115/44b4bb0e/attachment.htm 


More information about the xdg mailing list