2007/9/17, Anders Rune Jensen <<a href="mailto:anders@iola.dk">anders@iola.dk</a>>:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On 9/17/07, Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com</a>> wrote:<br>> 2007/9/17, Mikkel Kamstrup Erlandsen <<a href="mailto:mikkel.kamstrup@gmail.com">mikkel.kamstrup@gmail.com
</a>>:<br>> > 2007/9/17, Anders Rune Jensen <<a href="mailto:anders@iola.dk">anders@iola.dk</a>>:<br>> ><br>> > > Hi<br>> > ><br>> > > I've been reading the Xesam Query Language specification and I have a
<br>> questions:<br>> > ><br>> > > 1) How do I get all tags for all files? Preferable as list<name,<br>> > > count>. The same is true for other attributes such as mime-types.<br>> >
<br>> ><br>> > You are right that there is no obvious way to do this atm. I think the<br>> best solution is to simply add a new selector called "any", and use it like<br>> the following.<br>> >
<br>> > Firstly note that the ontology does not yet implement Tags as first class<br>> objects, but this is on Evgenys todo I believe. Since this is not ready yet<br>> consider the following an example only.<br>
> ><br>> > Assume that the Tag objects has a field xesam:tagName. The return type for<br>> GetHits is controlled via the session property hit.fields, so if you set<br>> this to [xesam:tagName] and do the query
<br>> ><br>> > <request xmlns="<br>> <a href="http://freedesktop.org/standards/xesam/1.0/query">http://freedesktop.org/standards/xesam/1.0/query</a>"><br>> > <query content="xesam:Tag">
<br>> > <any/><br>> > </query><br>> > </request><br>> ><br>> > GetHits will give you all all known Tag names. If you want the count for<br>> each tag I see two solutions. Either the ontology should define a field
<br>> xesam:memberCount or something like that, or you create a new search for<br>> each tag like<br>> ><br>> > <request<br>> xmlns="<a href="http://freedesktop.org/standards/xesam/1.0/query">
http://freedesktop.org/standards/xesam/1.0/query</a>"><br>> > <query content="xesam:Tag"><br>> > <equals><br>> > <field name="xesam:tagName"/><br>
> > <string>TAGNAME_HERE</string><br>> > </equals><br>> > </query><br>> > </request><br>> ><br>> > and simply issue GetHitCount on each search and then close it. This should
<br>> actually be fairly efficient on most backends.<br>> ><br>> > It should be noted that tagging and tag management as such might be hard<br>> to implement completely in the search API. The search API is mainly
<br>> targetted at "search" :-) In XESAM iteration 2 we will focus on a metadata<br>> management API where such things might be more natural.<br>><br>> I just realised that my answer above does not answer your question about
<br>> mime types and other similar things you might want to list.<br>><br>> I think what you ask for is a way to request all different values for a<br>> given field. I'm not 100% sure that all implementations can support this
<br>> effectively. Lucene does not support this AFAIK.<br>><br>> Another thing is the use case... In the case of mime types why can't you use<br>> the conventional ways to get a list of known mime types? Why would you
<br>> generally want to list all known values of a field? I'm thinking some<br>> clustered browsing or something...<br><br>The use case would be something like a music program where you would<br>like to know what audio types is stored on the hd (mp3, ogg, flac
<br>etc.) and then add only those as buttons so the user can for example<br>select only flac files. Also it could be nice to know how many flac<br>files there where before clicking the flac button. In particular<br>because the result list might be limited to a certain number of
<br>results (and because getting the total number (just the count) of flac<br>files this way is inefficient ;-)).</blockquote><div><br>In this case I think it is perfectly OK to spawn a search for each supported audio mime type and issue only a GetHitCount call on each search before you close it (if we are talking < 10 or so). That should be quite fast on all backends since there is no real data transmission going on. Moreover if the hit data is only fetched lazily (like Lucene (and Xapian I believe)) then the overhead of requesting a few hit counts is really small. I don't know how fast Tracker would be here, but I suspect plenty fast.
<br></div><br></div>Cheers,<br>Mikkel<br>