simple search api (was Re: mimetype standardisation by testsets)

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Wed Dec 20 22:05:54 EET 2006


2006/12/20, Jean-Francois Dockes <jean-francois.dockes at wanadoo.fr>:
>
> Mikkel Kamstrup Erlandsen writes:
> ...
> > I think you are quite right. Except that maybe the output parameter of
> > simple.Query should be a{sa{sas}} - a map mapping uris to maps of
> > property-valuelist pairs. The trick is that metadata fields can have
> several
> > values (like the simple.GetProperties method). If I request the Email.CCand
> > Email.To fields for example I'd get something like
> >
> > {
> >   "email://mail_indetifier1" : {
> >     "Mail.CC" : [foo at bar.xyz, emfle at birnan.xyz],
> >     "Mail.To" : ["linus.torvalds at microsoft.com"]
> >   }
> >   "email://mail_indetifier2" : {
> >     "Mail.CC" : [foo at bar.xyz],
> >     "Mail.To" : ["bill at osdl.org"]
> >   }
> > }
>
> This is were we disagree. You are requesting a seqence of 'limit' results,
> starting at offset 'i'. There is no reason to have special treatment for
> the URI. It's just another property. The result list should be like:
>
> {
>   "URI"     : "email://mail_indetifier1"
>   "Mail.CC" : [foo at bar.xyz, emfle at birnan.xyz]
>   "Mail.To" : ["linus.torvalds at microsoft.com"]
> }
> {
>   "URI"     : "email://mail_indetifier2"
>   "Mail.CC" : [foo at bar.xyz],
>   "Mail.To" : ["bill at osdl.org"]
> }
>
> Just an ordered sequence of maps, the implicit key to the sequence is the
> record number from 'offset' to 'offset+limit'
>
> I think that it is wrong to make the URI such a central element, it is not
> so special for any backend I had had the opportunity to have a look at.



I don't care about the backends :-) I just want a convenient api for
applications... (ofcourse I care greatly about backends - I was just making
a point).

Having a unique handle for each document is a really handy thing. A unique
handle could be many other things than an uri, fx a unique integer or
anything as well. The way you describe this handle is just a number
specifying an entry in an array and is in this way not unique (except in the
context of a query).

The uri is just a more convenient handle than fx an integer - for
applications at least (the toolkit and platform libs often handle uris
directly).

I respect your disagreement, and would really like to hear what the other
guys think...


> The GetSnippet method must have a query string to match up against -
> > GetProperties do only need an uri and a list of requested props.
> > Arguable they could both be merged into Query, but I feel it might be
> > overkill issuing a separate query to retrieve given metadata fields on a
> > given uri - that is more like a lookup in my mind (and also is for some
> > engines).
>
> The GetSnippet method if you need to have one can use the same ordinal key
> that Query() is implicitely using. Using the URI for this forces awkward
> processing on the backend side with no benefit to the application (which
> has to know the index of a result anyway).
>
> > You can't merge GetSnippet into you main query it is a relatively slow
> > operation on most engines, so you have to do that after you got the
> actual
> > hit.
>
> Ok, so you don't request "Snippet" as a property in the initial query, and
> re-call Query() with the appropriate record number, requesting the
> "Snippet" property for the record you want the Snippet for. If getting a
> snippet is slow and costly, using a dbus transaction for it should not be
> an
> issue.
>
> Or if you really want to, you could define a call requesting snippets for
> a
> list of result numbers. All I'm saying is that 'URI' is not a good result
> identifier.
>
> > These was the reasons why I split the methods like I did and I still
> think
> > they hold...
>
> My central point is that 'URI' is not a good result identifier. Results
> are
> not organized by URI either on the application or backend side. The result
> list is an ordered sequence, the natural accessor is the number in the
> sequence.



Ok, I see you point about ordering. The response would have to include a
score property for my proposal to work out.


Pasting in Jean-Francois'  follow up mail:
 > As an afterthought to my previous message (sorry), the result list could
 > change if the query has to be re-run. This is a good reason for keeping
the
 > uris as document identifiers for getSnippets().


It would feel akward if you had to request a specific property (the uri) to
be able to obtain a snippet IMHO.



Ok, my central point is: We need a unique handle for each document/object in
store - this should be used to identify the returned hits from Query().
Whether or not an opaque handle of some undefined sort or it is defined to
be the uri is another matter.

To the sorting problem I see too solutions. 1) Always return a score
property as part of the response properties as defined in my proposal. 2)
Always include the UniqueHandle property as part of the response as defined
by your proposal.



Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20061220/08faf08e/attachment.htm 


More information about the xdg mailing list