[XESAM] Search API Update Proposals

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Mon Aug 6 23:29:51 PDT 2007


2007/8/6, Evgeny Egorochkin <phreedom.stdin at gmail.com>:
>
> On Thursday 02 August 2007 21:31:35 Mikkel Kamstrup Erlandsen wrote:
> > I've been collecting comments on the Xesam spec lately, and I think we
> have
> > enough now to do an update round of the spec proposal.
> >
> > Proposed changes can be found here:
> > http://wiki.freedesktop.org/wiki/XesamSearchUpdates
>
> > 2. Type Attribute in Query Language
>
> I agree with Jamie that usually people will just enumerate categories in
> which
> to search, without any complex criterias.
>
> My concern with <query category="xesam:Audio"> is that it may not play
> nicely
> with services, which may use a different class structure/trees. It would
> be
> nice to also have a more generic way to specify categories, by treating
> them
> as regular fields.
>
> One of possible ways here is to treat query attrs as a shortcut, while
> letting
> implementations also support(at their discretion) the more generic
> approach.


Ok, here's my take - I think it should match your concerns too. Let cat/src
attrs be the official and blessed way of selecting cat/srcs to query. Both
cat and src attr is a list of comma separated cats/srcs, fx:

  <query source="xesam:ArchiveItem, xesam:File" category="xesam:Audio">
    <contains>
        <field name="xesam:title"/>
        <string>purple rain</string>
    </contains>
  </query>

Then add, as query language extensions, a category- and a source selector.
You would have to check for them through the session property
vendor.extensions, like you have to do with proximity- and regexp selectors
currently.

The source and category extensions could be selectors for maximum
flexibility, so that they are used like:

<query>
  <and>
    <category name="xesam:Audio"/>
    <source name="xesam:File"/>
    <source name="xesam:ArchiveItem"/>
    <contains>
        <field name="xesam:title"/>
        <string>purple rain</string>
    </contains>
  </and>
</query>

But let me stress that category and source selectors would be *optional*
extensions. This means that they should not cause parse errors, just be
ignored if you do not support them (and people should never use them unless
they query them via the vendor.extensions property).

It appears to me that it might be a good idea to prefix all extensions with
"x-" . That way people grapping code snippets of the web should be able to
clearly see if there are query extensions involved. Ie this would mean that
the cat and source elements above should be called <x-category> and
<x-source>.

Also, if no category is specified, there must be some defaults, however I
> feel
> they are hard to establish:
>
> One user might by default want to search files only. Other user might
> prefer
> to search both files and archives. Typical case is extensive musical
> library
> in archives, stored this way for easy P2P sharing, while user uses
> plug-ins
> to listen to music directly in archives.
>
> Defaults for categories should be probably user-configurable and/or
> implementation-specific. Client apps are best off stating categories
> explicitly.


Yes I agree. I think the defaults should be up to the search engine, but it
should be advised to just be "everything that makes sense". The default
values should not be for users, more for devs wanting to hack together a
quick search-enabled-UI.

> 4. Problems With Hit Data
>
> We have the following potential solutions:
>
>         aas] works except for ambiguity in strings
>
>         aav] have to use a workaround, like treating one of unused
> datatypes as a
> null value
>
>         aa(bv)] b is a null flag. v is the value if applicable
>
>
> I'd prefer unambiguous ones for the sake of completeness.
>
> Cases where this may be important:
> 1) password: none or unknown
> 2) software flags: none or unknown
> Also empty value is often used to specify default, which is not unknown.


I think we should stick with "aav" and say that the value is null if it
contains a zero byte. Ie NULL == V(b=0).


Additionally, we might want to consider blob streaming interface as per
> Jamie's suggestion.
>
> Also, I'd like to eventually revisit thumbnail stuff. I don't think
> current
> thumbnail spec and vfs/kio can e.g. provide thumbnails for stuff embedded
> into documents. At least our solution is going to be more generic and also
> easier to use for clients. But this is not top priority.



I agree that these are things we will eventually need. Can you put a note on
http://wiki.freedesktop.org/wiki/XesamIteration2 ?


> 5. Session Properties search.blocking and search.live
>
> It's good if bindings actually make it easy to go async.


Yes. It is  easy to  handle this in the bindings as I know for a fact with
my work on libxesam-glib. The way I will do it is to make everything async
by default. If the client tries something that forces and synchronous
operation over the wire I will log a warning (which can be suppresed via a
flag).

> 6. Introduce search.readonce
>
> Not clear just how sizable is the benefit. At least introducing it won't
> hurt.


Right, I will try and punk some devs to get the estimates.

> 7. Rename CountHits to GetHitCount
>
> No problem.
>
> > 8. search.maxhits
>
> Proposed by Jos, so I don't interfere.
>
> > 9. Split out vendor.fieldnames
>
> Could be hard to implement. I'm not sure it's always easy to enumerate all
> supported fields and classes due to plug-in architecture of most
> analyzers.


I would suspect that it is not that hard to get this info. In most cases it
should amount to reading the field defs in a Lucene index or scanning the
table names in a DB. But some impls might be special, I will investigate.

> 10. Use uint Instead of Int Where Applicable
>
> Ok.
>
> >XML/INI
>
> Agreed to install both via convertors.


I think this is a really bad idea if the only reason for this is that we
couldn't agree. If it is because it makes life easier for everybody else too
then I'm more positive, but I do not think this is the case.

I was actually pro-rdf/xml until I tried writing a sax parser for rdf/xml...
Does anybody know of an expat based parser for rdf/xml?

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070807/b68ce1bf/attachment-0001.htm 


More information about the xdg mailing list