[XESAM] Search API Update Proposals

Tue Aug 7 12:18:12 PDT 2007

2007/8/7, Evgeny Egorochkin <phreedom.stdin at gmail.com>:
>
> On Tuesday 07 August 2007 09:29:51 Mikkel Kamstrup Erlandsen wrote:
> > > > 2. Type Attribute in Query Language
> > >
> > > I agree with Jamie that usually people will just enumerate categories
> in
> > > which
> > > to search, without any complex criterias.
> > >
> > > My concern with <query category="xesam:Audio"> is that it may not play
> > > nicely
> > > with services, which may use a different class structure/trees. It
> would
> > > be
> > > nice to also have a more generic way to specify categories, by
> treating
> > > them
> > > as regular fields.
> > >
> > > One of possible ways here is to treat query attrs as a shortcut, while
> > > letting
> > > implementations also support(at their discretion) the more generic
> > > approach.
> >
> > Ok, here's my take - I think it should match your concerns too. Let
> cat/src
> > attrs be the official and blessed way of selecting cat/srcs to query.
> Both
> > cat and src attr is a list of comma separated cats/srcs, fx:
>
> ...
>
> > The source and category extensions could be selectors for maximum
> > flexibility, so that they are used like:
> >
> > <query>
> >   <and>
> >     <category name="xesam:Audio"/>
> >     <source name="xesam:File"/>
> >     <source name="xesam:ArchiveItem"/>
> >     <contains>
> >         <field name="xesam:title"/>
> >         <string>purple rain</string>
> >     </contains>
> >   </and>
> > </query>
> >
> > But let me stress that category and source selectors would be *optional*
> > extensions. This means that they should not cause parse errors, just be
> > ignored if you do not support them (and people should never use them
> unless
> > they query them via the vendor.extensions property).
> >
> > It appears to me that it might be a good idea to prefix all extensions
> with
> > "x-" . That way people grapping code snippets of the web should be able
> to
> > clearly see if there are query extensions involved. Ie this would mean
> that
> > the cat and source elements above should be called <x-category> and
> > <x-source>.
>
> I'd prefer to allow to specify categories like regular field criterias.
> The
> reason is simplicity and ultimate flexibility. Remember some services
> might
> not use xesam class trees.

I see the point here, but it also has its drawbacks. I imagine that some
backends will want to treat cats/sources different than fields when building
the query. They will have to look up all <field> elements and check what
type they belong to. That might be expensive (compared to a straight forward
parser just building the olde query).

We could be extensive via a "type" selector instead:

<type name="category" value="xesam:Audio"/>

> > 4. Problems With Hit Data
> > >
> > > We have the following potential solutions:
> > >
> > >         aas] works except for ambiguity in strings
> > >
> > >         aav] have to use a workaround, like treating one of unused
> > > datatypes as a
> > > null value
> > >
> > >         aa(bv)] b is a null flag. v is the value if applicable
> > >
> > >
> > > I'd prefer unambiguous ones for the sake of completeness.
> > >
> > > Cases where this may be important:
> > > 1) password: none or unknown
> > > 2) software flags: none or unknown
> > > Also empty value is often used to specify default, which is not
> unknown.
> >
> > I think we should stick with "aav" and say that the value is null if it
> > contains a zero byte. Ie NULL == V(b=0).
>
> aa(bv) looks cleaner since we are just compensating for missing
> functionality,
> while aav is a hack,  using stuff in ways not intended.

If you ask be aa(bv) is a big a hack as aav+(V(b=0)==NULL).

My concern with aav is how easy and reliably can bindings determine file
> types
> especially since we need to distinguish int from int. Also, with aa(bv),
> when
> we set null flag, we can also make v="" so that clients who didn't bother
> to
> check will still receive the closest match to null as opposed to 0 or 1
> (suppose this gets displayed by gui).

As far as I can tell GLib and Python bindings should not have problems. I
don't know about Qt. People using raw libdbus should definitely be home
free.  Then there are Java, Perl, and C# (are there any Haskell bindings?)
of which I know nothing.

My problem with aa(bv) is that it is more complex to work with. I am quite
worried about introducing the overhead of struct just for this purpose. Both
complexity wise and performance wise.

> > > > 9. Split out vendor.fieldnames
> > >
> > > Could be hard to implement. I'm not sure it's always easy to enumerate
> > > all supported fields and classes due to plug-in architecture of most
> > > analyzers.
> >
> > I would suspect that it is not that hard to get this info. In most cases
> it
> > should amount to reading the field defs in a Lucene index or scanning
> the
> > table names in a DB.
>
> Not sure. If you've just installed a plug-in, it's properties will not be
> visible. But I'm nitpicking.
>
> > But some impls might be special, I will investigate.
>
> Maybe we should redefine this as a minimal supported set. i.e. backend
> supports there 100% but there may be others?

Well, any engine could always fall back to this and have reasonable results
anyway. I would like to hear the opinions of some of the engine devs here.
As always I will try to pester them on IRC.

> > >XML/INI
> > >
> > > Agreed to install both via convertors.
> >
> > I think this is a really bad idea if the only reason for this is that we
> > couldn't agree. If it is because it makes life easier for everybody else
> > too then I'm more positive, but I do not think this is the case.
>
> Strigi needs to produce RDF anyway, so dropping it is out of question for
> us.
> The question is how and in which way to support it.
>
> > I was actually pro-rdf/xml until I tried writing a sax parser for
> > rdf/xml... Does anybody know of an expat based parser for rdf/xml?
>
> I'm looking into it ATM. We also can get away with a simplified one, e.g.
> we
> don't need nested stuff.

I don't know how expat handles custom entity defs and such either. Also if
we want to allow cross refs between files concerns me...

Cheers,
Mikkel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/xdg/attachments/20070807/8a14b8bc/attachment.html