[XESAM] Search API Update Proposals
Mikkel Kamstrup Erlandsen
mikkel.kamstrup at gmail.com
Tue Aug 7 12:18:12 PDT 2007
2007/8/7, Evgeny Egorochkin <phreedom.stdin at gmail.com>:
> On Tuesday 07 August 2007 09:29:51 Mikkel Kamstrup Erlandsen wrote:
> > > > 2. Type Attribute in Query Language
> > >
> > > I agree with Jamie that usually people will just enumerate categories
> > > which
> > > to search, without any complex criterias.
> > >
> > > My concern with <query category="xesam:Audio"> is that it may not play
> > > nicely
> > > with services, which may use a different class structure/trees. It
> > > be
> > > nice to also have a more generic way to specify categories, by
> > > them
> > > as regular fields.
> > >
> > > One of possible ways here is to treat query attrs as a shortcut, while
> > > letting
> > > implementations also support(at their discretion) the more generic
> > > approach.
> > Ok, here's my take - I think it should match your concerns too. Let
> > attrs be the official and blessed way of selecting cat/srcs to query.
> > cat and src attr is a list of comma separated cats/srcs, fx:
> > The source and category extensions could be selectors for maximum
> > flexibility, so that they are used like:
> > <query>
> > <and>
> > <category name="xesam:Audio"/>
> > <source name="xesam:File"/>
> > <source name="xesam:ArchiveItem"/>
> > <contains>
> > <field name="xesam:title"/>
> > <string>purple rain</string>
> > </contains>
> > </and>
> > </query>
> > But let me stress that category and source selectors would be *optional*
> > extensions. This means that they should not cause parse errors, just be
> > ignored if you do not support them (and people should never use them
> > they query them via the vendor.extensions property).
> > It appears to me that it might be a good idea to prefix all extensions
> > "x-" . That way people grapping code snippets of the web should be able
> > clearly see if there are query extensions involved. Ie this would mean
> > the cat and source elements above should be called <x-category> and
> > <x-source>.
> I'd prefer to allow to specify categories like regular field criterias.
> reason is simplicity and ultimate flexibility. Remember some services
> not use xesam class trees.
I see the point here, but it also has its drawbacks. I imagine that some
backends will want to treat cats/sources different than fields when building
the query. They will have to look up all <field> elements and check what
type they belong to. That might be expensive (compared to a straight forward
parser just building the olde query).
We could be extensive via a "type" selector instead:
<type name="category" value="xesam:Audio"/>
> > 4. Problems With Hit Data
> > >
> > > We have the following potential solutions:
> > >
> > > aas] works except for ambiguity in strings
> > >
> > > aav] have to use a workaround, like treating one of unused
> > > datatypes as a
> > > null value
> > >
> > > aa(bv)] b is a null flag. v is the value if applicable
> > >
> > >
> > > I'd prefer unambiguous ones for the sake of completeness.
> > >
> > > Cases where this may be important:
> > > 1) password: none or unknown
> > > 2) software flags: none or unknown
> > > Also empty value is often used to specify default, which is not
> > I think we should stick with "aav" and say that the value is null if it
> > contains a zero byte. Ie NULL == V(b=0).
> aa(bv) looks cleaner since we are just compensating for missing
> while aav is a hack, using stuff in ways not intended.
If you ask be aa(bv) is a big a hack as aav+(V(b=0)==NULL).
My concern with aav is how easy and reliably can bindings determine file
> especially since we need to distinguish int from int. Also, with aa(bv),
> we set null flag, we can also make v="" so that clients who didn't bother
> check will still receive the closest match to null as opposed to 0 or 1
> (suppose this gets displayed by gui).
As far as I can tell GLib and Python bindings should not have problems. I
don't know about Qt. People using raw libdbus should definitely be home
free. Then there are Java, Perl, and C# (are there any Haskell bindings?)
of which I know nothing.
My problem with aa(bv) is that it is more complex to work with. I am quite
worried about introducing the overhead of struct just for this purpose. Both
complexity wise and performance wise.
> > > > 9. Split out vendor.fieldnames
> > >
> > > Could be hard to implement. I'm not sure it's always easy to enumerate
> > > all supported fields and classes due to plug-in architecture of most
> > > analyzers.
> > I would suspect that it is not that hard to get this info. In most cases
> > should amount to reading the field defs in a Lucene index or scanning
> > table names in a DB.
> Not sure. If you've just installed a plug-in, it's properties will not be
> visible. But I'm nitpicking.
> > But some impls might be special, I will investigate.
> Maybe we should redefine this as a minimal supported set. i.e. backend
> supports there 100% but there may be others?
Well, any engine could always fall back to this and have reasonable results
anyway. I would like to hear the opinions of some of the engine devs here.
As always I will try to pester them on IRC.
> > >XML/INI
> > >
> > > Agreed to install both via convertors.
> > I think this is a really bad idea if the only reason for this is that we
> > couldn't agree. If it is because it makes life easier for everybody else
> > too then I'm more positive, but I do not think this is the case.
> Strigi needs to produce RDF anyway, so dropping it is out of question for
> The question is how and in which way to support it.
> > I was actually pro-rdf/xml until I tried writing a sax parser for
> > rdf/xml... Does anybody know of an expat based parser for rdf/xml?
> I'm looking into it ATM. We also can get away with a simplified one, e.g.
> don't need nested stuff.
I don't know how expat handles custom entity defs and such either. Also if
we want to allow cross refs between files concerns me...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the xdg