2007/8/7, Evgeny Egorochkin <<a href="mailto:phreedom.stdin@gmail.com">phreedom.stdin@gmail.com</a>>:<div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> On Tuesday 07 August 2007 09:29:51 Mikkel Kamstrup Erlandsen wrote: > > > 2. Type Attribute in Query Language > > > > I agree with Jamie that usually people will just enumerate categories in > > which > > to search, without any complex criterias. > > > > My concern with <query category="xesam:Audio"> is that it may not play > > nicely > > with services, which may use a different class structure/trees. It would > > be > > nice to also have a more generic way to specify categories, by treating > > them > > as regular fields. > > > > One of possible ways here is to treat query attrs as a shortcut, while > > letting > > implementations also support(at their discretion) the more generic > > approach. > > Ok, here's my take - I think it should match your concerns too. Let cat/src > attrs be the official and blessed way of selecting cat/srcs to query. Both > cat and src attr is a list of comma separated cats/srcs, fx: ... > The source and category extensions could be selectors for maximum > flexibility, so that they are used like: > > <query> >   <and> >     <category name="xesam:Audio"/> >     <source name="xesam:File"/> >     <source name="xesam:ArchiveItem"/> >     <contains> >         <field name="xesam:title"/> >         <string>purple rain</string> >     </contains> >   </and> > </query> > > But let me stress that category and source selectors would be *optional* > extensions. This means that they should not cause parse errors, just be > ignored if you do not support them (and people should never use them unless > they query them via the vendor.extensions property). > > It appears to me that it might be a good idea to prefix all extensions with > "x-" . That way people grapping code snippets of the web should be able to > clearly see if there are query extensions involved. Ie this would mean that > the cat and source elements above should be called <x-category> and > <x-source>. I'd prefer to allow to specify categories like regular field criterias. The reason is simplicity and ultimate flexibility. Remember some services might not use xesam class trees.</blockquote><div> I see the point here, but it also has its drawbacks. I imagine that some backends will want to treat cats/sources different than fields when building the query. They will have to look up all <field> elements and check what type they belong to. That might be expensive (compared to a straight forward parser just building the olde query). We could be extensive via a "type" selector instead: <type name="category" value="xesam:Audio"/> </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> > > 4. Problems With Hit Data > > > > We have the following potential solutions: > > > >         aas] works except for ambiguity in strings > > > >         aav] have to use a workaround, like treating one of unused > > datatypes as a > > null value > > > >         aa(bv)] b is a null flag. v is the value if applicable > > > > > > I'd prefer unambiguous ones for the sake of completeness. > > > > Cases where this may be important: > > 1) password: none or unknown > > 2) software flags: none or unknown > > Also empty value is often used to specify default, which is not unknown. > > I think we should stick with "aav" and say that the value is null if it > contains a zero byte. Ie NULL == V(b=0). aa(bv) looks cleaner since we are just compensating for missing functionality, while aav is a hack,  using stuff in ways not intended.</blockquote><div> If you ask be aa(bv) is a big a hack as aav+(V(b=0)==NULL). </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> My concern with aav is how easy and reliably can bindings determine file types especially since we need to distinguish int from int. Also, with aa(bv), when we set null flag, we can also make v="" so that clients who didn't bother to check will still receive the closest match to null as opposed to 0 or 1 (suppose this gets displayed by gui).</blockquote><div> As far as I can tell GLib and Python bindings should not have problems. I don't know about Qt. People using raw libdbus should definitely be home free.  Then there are Java, Perl, and C# (are there any Haskell bindings?) of which I know nothing. </div> My problem with aa(bv) is that it is more complex to work with. I am quite worried about introducing the overhead of struct just for this purpose. Both complexity wise and performance wise. <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> > > > 9. Split out vendor.fieldnames > > > > Could be hard to implement. I'm not sure it's always easy to enumerate > > all supported fields and classes due to plug-in architecture of most > > analyzers. > > I would suspect that it is not that hard to get this info. In most cases it > should amount to reading the field defs in a Lucene index or scanning the > table names in a DB. Not sure. If you've just installed a plug-in, it's properties will not be visible. But I'm nitpicking. > But some impls might be special, I will investigate. Maybe we should redefine this as a minimal supported set. i.e. backend supports there 100% but there may be others?</blockquote><div> Well, any engine could always fall back to this and have reasonable results anyway. I would like to hear the opinions of some of the engine devs here. As always I will try to pester them on IRC. </div> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> > >XML/INI > > > > Agreed to install both via convertors. > > I think this is a really bad idea if the only reason for this is that we > couldn't agree. If it is because it makes life easier for everybody else > too then I'm more positive, but I do not think this is the case. Strigi needs to produce RDF anyway, so dropping it is out of question for us. The question is how and in which way to support it. > I was actually pro-rdf/xml until I tried writing a sax parser for > rdf/xml... Does anybody know of an expat based parser for rdf/xml? I'm looking into it ATM. We also can get away with a simplified one, e.g. we don't need nested stuff.</blockquote><div> I don't know how expat handles custom entity defs and such either. Also if we want to allow cross refs between files concerns me... </div> </div>Cheers, Mikkel