2007/8/7, Evgeny Egorochkin <<a href="mailto:phreedom.stdin@gmail.com">phreedom.stdin@gmail.com</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Tuesday 07 August 2007 09:29:51 Mikkel Kamstrup Erlandsen wrote:<br>> > > 2. Type Attribute in Query Language<br>> ><br>> > I agree with Jamie that usually people will just enumerate categories in<br>
> > which<br>> > to search, without any complex criterias.<br>> ><br>> > My concern with <query category="xesam:Audio"> is that it may not play<br>> > nicely<br>> > with services, which may use a different class structure/trees. It would
<br>> > be<br>> > nice to also have a more generic way to specify categories, by treating<br>> > them<br>> > as regular fields.<br>> ><br>> > One of possible ways here is to treat query attrs as a shortcut, while
<br>> > letting<br>> > implementations also support(at their discretion) the more generic<br>> > approach.<br>><br>> Ok, here's my take - I think it should match your concerns too. Let cat/src<br>
> attrs be the official and blessed way of selecting cat/srcs to query. Both<br>> cat and src attr is a list of comma separated cats/srcs, fx:<br><br>...<br><br>> The source and category extensions could be selectors for maximum
<br>> flexibility, so that they are used like:<br>><br>> <query><br>> <and><br>> <category name="xesam:Audio"/><br>> <source name="xesam:File"/><br>> <source name="xesam:ArchiveItem"/>
<br>> <contains><br>> <field name="xesam:title"/><br>> <string>purple rain</string><br>> </contains><br>> </and><br>> </query>
<br>><br>> But let me stress that category and source selectors would be *optional*<br>> extensions. This means that they should not cause parse errors, just be<br>> ignored if you do not support them (and people should never use them unless
<br>> they query them via the vendor.extensions property).<br>><br>> It appears to me that it might be a good idea to prefix all extensions with<br>> "x-" . That way people grapping code snippets of the web should be able to
<br>> clearly see if there are query extensions involved. Ie this would mean that<br>> the cat and source elements above should be called <x-category> and<br>> <x-source>.<br><br>I'd prefer to allow to specify categories like regular field criterias. The
<br>reason is simplicity and ultimate flexibility. Remember some services might<br>not use xesam class trees.</blockquote><div><br>I see the point here, but it also has its drawbacks. I imagine that some backends will want to treat cats/sources different than fields when building the query. They will have to look up all <field> elements and check what type they belong to. That might be expensive (compared to a straight forward parser just building the olde query).
<br><br>We could be extensive via a "type" selector instead:<br><br><type name="category" value="xesam:Audio"/><br><br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> > 4. Problems With Hit Data<br>> ><br>> > We have the following potential solutions:<br>> ><br>> > aas] works except for ambiguity in strings<br>> ><br>> > aav] have to use a workaround, like treating one of unused
<br>> > datatypes as a<br>> > null value<br>> ><br>> > aa(bv)] b is a null flag. v is the value if applicable<br>> ><br>> ><br>> > I'd prefer unambiguous ones for the sake of completeness.
<br>> ><br>> > Cases where this may be important:<br>> > 1) password: none or unknown<br>> > 2) software flags: none or unknown<br>> > Also empty value is often used to specify default, which is not unknown.
<br>><br>> I think we should stick with "aav" and say that the value is null if it<br>> contains a zero byte. Ie NULL == V(b=0).<br><br>aa(bv) looks cleaner since we are just compensating for missing functionality,
<br>while aav is a hack, using stuff in ways not intended.</blockquote><div><br>If you ask be aa(bv) is a big a hack as aav+(V(b=0)==NULL). <br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
My concern with aav is how easy and reliably can bindings determine file types<br>especially since we need to distinguish int from int. Also, with aa(bv), when<br>we set null flag, we can also make v="" so that clients who didn't bother to
<br>check will still receive the closest match to null as opposed to 0 or 1<br>(suppose this gets displayed by gui).</blockquote><div><br>As far as I can tell GLib and Python bindings should not have problems. I don't know about Qt. People using raw libdbus should definitely be home free. Then there are Java, Perl, and C# (are there any Haskell bindings?) of which I know nothing.
<br></div><br>My problem with aa(bv) is that it is more complex to work with. I am quite worried about introducing the overhead of struct just for this purpose. Both complexity wise and performance wise.<br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>> > > 9. Split out vendor.fieldnames<br>> ><br>> > Could be hard to implement. I'm not sure it's always easy to enumerate<br>> > all supported fields and classes due to plug-in architecture of most
<br>> > analyzers.<br>><br>> I would suspect that it is not that hard to get this info. In most cases it<br>> should amount to reading the field defs in a Lucene index or scanning the<br>> table names in a DB.
<br><br>Not sure. If you've just installed a plug-in, it's properties will not be<br>visible. But I'm nitpicking.<br><br>> But some impls might be special, I will investigate.<br><br>Maybe we should redefine this as a minimal supported set.
i.e. backend<br>supports there 100% but there may be others?</blockquote><div><br>Well, any engine could always fall back to this and have reasonable results anyway. I would like to hear the opinions of some of the engine devs here. As always I will try to pester them on IRC.
<br><br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> > >XML/INI<br>> ><br>> > Agreed to install both via convertors.
<br>><br>> I think this is a really bad idea if the only reason for this is that we<br>> couldn't agree. If it is because it makes life easier for everybody else<br>> too then I'm more positive, but I do not think this is the case.
<br><br>Strigi needs to produce RDF anyway, so dropping it is out of question for us.<br>The question is how and in which way to support it.<br><br>> I was actually pro-rdf/xml until I tried writing a sax parser for<br>
> rdf/xml... Does anybody know of an expat based parser for rdf/xml?<br><br>I'm looking into it ATM. We also can get away with a simplified one, e.g. we<br>don't need nested stuff.</blockquote><div><br>I don't know how expat handles custom entity defs and such either. Also if we want to allow cross refs between files concerns me...
<br></div><br></div>Cheers,<br>Mikkel<br>