2007/8/6, Evgeny Egorochkin <<a href="mailto:phreedom.stdin@gmail.com">phreedom.stdin@gmail.com</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Thursday 02 August 2007 21:31:35 Mikkel Kamstrup Erlandsen wrote:<br>> I've been collecting comments on the Xesam spec lately, and I think we have<br>> enough now to do an update round of the spec proposal.<br>
><br>> Proposed changes can be found here:<br>> <a href="http://wiki.freedesktop.org/wiki/XesamSearchUpdates">http://wiki.freedesktop.org/wiki/XesamSearchUpdates</a><br><br>> 2. Type Attribute in Query Language
<br><br>I agree with Jamie that usually people will just enumerate categories in which<br>to search, without any complex criterias.<br><br>My concern with <query category="xesam:Audio"> is that it may not play nicely
<br>with services, which may use a different class structure/trees. It would be<br>nice to also have a more generic way to specify categories, by treating them<br>as regular fields.<br><br>One of possible ways here is to treat query attrs as a shortcut, while letting
<br>implementations also support(at their discretion) the more generic approach.</blockquote><div><br>Ok, here's my take - I think it should match your concerns too. Let cat/src attrs be the official and blessed way of selecting cat/srcs to query. Both cat and src attr is a list of comma separated cats/srcs, fx:
<br><br> <query source="xesam:ArchiveItem, xesam:File" category="xesam:Audio"><br> <contains><br>
<field name="xesam:title"/><br>
<string>purple rain</string><br>
</contains><br> </query><br><br>Then add, as query language extensions, a category- and a source selector. You would have to check for them through the session property vendor.extensions, like you have to do with proximity- and regexp selectors currently.
<br><br>The source and category extensions could be selectors for maximum flexibility, so that they are used like:<br><br><query><br> <and><br> <category name="xesam:Audio"/><br> <source name="xesam:File"/>
<br> <source name="xesam:ArchiveItem"/><br> <contains><br> <field name="xesam:title"/><br> <string>purple rain</string><br> </contains><br> </and>
<br></query><br><br>But let me stress that category and source selectors would be *optional* extensions. This means that they should not cause parse errors, just be ignored if you do not support them (and people should never use them unless they query them via the
vendor.extensions property).<br><br>It appears to me that it might be a good idea to prefix all extensions with "x-" . That way people grapping code snippets of the web should be able to clearly see if there are query extensions involved. Ie this would mean that the cat and source elements above should be called <x-category> and <x-source>.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Also, if no category is specified, there must be some defaults, however I feel
<br>they are hard to establish:<br><br>One user might by default want to search files only. Other user might prefer<br>to search both files and archives. Typical case is extensive musical library<br>in archives, stored this way for easy P2P sharing, while user uses plug-ins
<br>to listen to music directly in archives.<br><br>Defaults for categories should be probably user-configurable and/or<br>implementation-specific. Client apps are best off stating categories<br>explicitly.</blockquote><div>
<br>Yes I agree. I think the defaults should be up to the search engine, but it should be advised to just be "everything that makes sense". The default values should not be for users, more for devs wanting to hack together a quick search-enabled-UI.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> 4. Problems With Hit Data<br><br>We have the following potential solutions:
<br><br> aas] works except for ambiguity in strings<br><br> aav] have to use a workaround, like treating one of unused datatypes as a<br>null value<br><br> aa(bv)] b is a null flag. v is the value if applicable
<br><br><br>I'd prefer unambiguous ones for the sake of completeness.<br><br>Cases where this may be important:<br>1) password: none or unknown<br>2) software flags: none or unknown<br>Also empty value is often used to specify default, which is not unknown.
</blockquote><div><br>I think we should stick with "aav" and say that the value is null if it contains a zero byte. Ie NULL == V(b=0).<br> </div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Additionally, we might want to consider blob streaming interface as per<br>Jamie's suggestion.<br><br>Also, I'd like to eventually revisit thumbnail stuff. I don't think current<br>thumbnail spec and vfs/kio can
e.g. provide thumbnails for stuff embedded<br>into documents. At least our solution is going to be more generic and also<br>easier to use for clients. But this is not top priority.</blockquote><div><br><br>I agree that these are things we will eventually need. Can you put a note on
<a href="http://wiki.freedesktop.org/wiki/XesamIteration2">http://wiki.freedesktop.org/wiki/XesamIteration2</a> ?<br></div><br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> 5. Session Properties search.blocking and search.live<br><br>It's good if bindings actually make it easy to go async.</blockquote><div><br>Yes. It is easy to handle this in the bindings as I know for a fact with my work on libxesam-glib. The way I will do it is to make everything async by default. If the client tries something that forces and synchronous operation over the wire I will log a warning (which can be suppresed via a flag).
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> 6. Introduce search.readonce<br><br>Not clear just how sizable is the benefit. At least introducing it won't hurt.
</blockquote><div><br>Right, I will try and punk some devs to get the estimates.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> 7. Rename CountHits to GetHitCount<br><br>No problem.<br><br>> 8. search.maxhits<br><br>Proposed by Jos, so I don't interfere.<br><br>> 9. Split out vendor.fieldnames<br><br>Could be hard to implement. I'm not sure it's always easy to enumerate all
<br>supported fields and classes due to plug-in architecture of most analyzers.</blockquote><div><br>I would suspect that it is not that hard to get this info. In most cases it should amount to reading the field defs in a Lucene index or scanning the table names in a DB. But some impls might be special, I will investigate.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> 10. Use uint Instead of Int Where Applicable<br><br>Ok.<br><br>>XML/INI
<br><br>Agreed to install both via convertors.</blockquote><div><br>I think this is a really bad idea if the only reason for this is that we couldn't agree. If it is because it makes life easier for everybody else too then I'm more positive, but I do not think this is the case.
<br></div><br>I was actually pro-rdf/xml until I tried writing a sax parser for rdf/xml... Does anybody know of an expat based parser for rdf/xml?<br><br>Cheers,<br>Mikkel<br></div>