[XESAM] Search API Update Proposals

Tue Aug 7 06:12:01 PDT 2007

On Tuesday 07 August 2007 09:29:51 Mikkel Kamstrup Erlandsen wrote:
> > > 2. Type Attribute in Query Language
> >
> > I agree with Jamie that usually people will just enumerate categories in
> > which
> > to search, without any complex criterias.
> >
> > My concern with <query category="xesam:Audio"> is that it may not play
> > nicely
> > with services, which may use a different class structure/trees. It would
> > be
> > nice to also have a more generic way to specify categories, by treating
> > them
> > as regular fields.
> >
> > One of possible ways here is to treat query attrs as a shortcut, while
> > letting
> > implementations also support(at their discretion) the more generic
> > approach.
>
> Ok, here's my take - I think it should match your concerns too. Let cat/src
> attrs be the official and blessed way of selecting cat/srcs to query. Both
> cat and src attr is a list of comma separated cats/srcs, fx:

...

> The source and category extensions could be selectors for maximum
> flexibility, so that they are used like:
>
> <query>
>   <and>
>     <category name="xesam:Audio"/>
>     <source name="xesam:File"/>
>     <source name="xesam:ArchiveItem"/>
>     <contains>
>         <field name="xesam:title"/>
>         <string>purple rain</string>
>     </contains>
>   </and>
> </query>
>
> But let me stress that category and source selectors would be *optional*
> extensions. This means that they should not cause parse errors, just be
> ignored if you do not support them (and people should never use them unless
> they query them via the vendor.extensions property).
>
> It appears to me that it might be a good idea to prefix all extensions with
> "x-" . That way people grapping code snippets of the web should be able to
> clearly see if there are query extensions involved. Ie this would mean that
> the cat and source elements above should be called <x-category> and
> <x-source>.

I'd prefer to allow to specify categories like regular field criterias. The 
reason is simplicity and ultimate flexibility. Remember some services might 
not use xesam class trees.

> > 4. Problems With Hit Data
> >
> > We have the following potential solutions:
> >
> >         aas] works except for ambiguity in strings
> >
> >         aav] have to use a workaround, like treating one of unused
> > datatypes as a
> > null value
> >
> >         aa(bv)] b is a null flag. v is the value if applicable
> >
> >
> > I'd prefer unambiguous ones for the sake of completeness.
> >
> > Cases where this may be important:
> > 1) password: none or unknown
> > 2) software flags: none or unknown
> > Also empty value is often used to specify default, which is not unknown.
>
> I think we should stick with "aav" and say that the value is null if it
> contains a zero byte. Ie NULL == V(b=0).

aa(bv) looks cleaner since we are just compensating for missing functionality, 
while aav is a hack,  using stuff in ways not intended.

My concern with aav is how easy and reliably can bindings determine file types 
especially since we need to distinguish int from int. Also, with aa(bv), when 
we set null flag, we can also make v="" so that clients who didn't bother to 
check will still receive the closest match to null as opposed to 0 or 1
(suppose this gets displayed by gui).

Both are minor of course.

> Additionally, we might want to consider blob streaming interface as per
>
> > Jamie's suggestion.
> >
> > Also, I'd like to eventually revisit thumbnail stuff. I don't think
> > current
> > thumbnail spec and vfs/kio can e.g. provide thumbnails for stuff embedded
> > into documents. At least our solution is going to be more generic and
> > also easier to use for clients. But this is not top priority.
>
> I agree that these are things we will eventually need. Can you put a note
> on http://wiki.freedesktop.org/wiki/XesamIteration2 ?

Done.

> > > 9. Split out vendor.fieldnames
> >
> > Could be hard to implement. I'm not sure it's always easy to enumerate
> > all supported fields and classes due to plug-in architecture of most
> > analyzers.
>
> I would suspect that it is not that hard to get this info. In most cases it
> should amount to reading the field defs in a Lucene index or scanning the
> table names in a DB. 

Not sure. If you've just installed a plug-in, it's properties will not be 
visible. But I'm nitpicking.

> But some impls might be special, I will investigate. 

Maybe we should redefine this as a minimal supported set. i.e. backend 
supports there 100% but there may be others?

> > >XML/INI
> >
> > Agreed to install both via convertors.
>
> I think this is a really bad idea if the only reason for this is that we
> couldn't agree. If it is because it makes life easier for everybody else
> too then I'm more positive, but I do not think this is the case.

Strigi needs to produce RDF anyway, so dropping it is out of question for us. 
The question is how and in which way to support it.

> I was actually pro-rdf/xml until I tried writing a sax parser for
> rdf/xml... Does anybody know of an expat based parser for rdf/xml?

I'm looking into it ATM. We also can get away with a simplified one, e.g. we 
don't need nested stuff.

-- Evgeny