Shared-mime checking order

David Faure dfaure at trolltech.com
Mon Oct 15 07:59:10 PDT 2007


Hi,

Wow it's been a month already... been too busy to be able to answer before, sorry about that.

On Thursday 20 September 2007, Alexander Larsson wrote:
> On Wed, 2007-09-19 at 21:53 +0200, David Faure wrote:
> > On Tuesday 18 September 2007, Alexander Larsson wrote:
> > Agreed. Do we also agree that this handling of multiple glob matches can be done right away
> > inside glob-matching? No need to delay that to the "If several globs matches" resolution
> > (after sniffing), IMHO.
> > 
> > So the new algorithm would be the one described with Alexander, with something like this prepended:
> > Glob-matching should prefer derived mimetype over base mimetype, and longer matches
> > over shorter ones. However if two globs of the same length match the file, and the two
> > matches are not related in the inheritance tree, then we have a "glob conflict", which
> > will be resolved below.
> > "If several globs matches" in Alexander's algorithm really becomes "In case of a glob conflict", 
> > i.e. two or more mimetypes with the same glob (like *.doc or *.ogg).
> > 
> > [Well technically you could invent a pattern like foo.* and *.doc, so that foo.doc matches both and
> > you don't have a "longer match", but this is really border case (and would simply be handled
> > as a "glob conflict" too).]
> 
> I think this sounds fine to me. There is only one more thing that I
> think needs to be resolved. What mimetype do we pick on a glob conflict
> if we only know the name (i.e. if we can't sniff). Should we add a
> priority thing? Use the order in the files?

Good point. A priority would be the best way to ensure consistent results. E.g. we can
probably all agree that ftp://bar/foo.doc should have a msword icon, because it's
just more common than "text files named .doc".
It's going to be confusing reading the xml spec though, if it has both priorities for
magic and completely unrelated priorities for extensions, and worse, the extensions'
priority is only used in case of conflicts between extensions...
Any thoughts? I would be ok with <glob pattern="*.doc" conflictPriority="10"/>

> If we add priority to the glob tag (with some default if its not set) we
> might be able to handle this in a backwards compat way by having the
> priority affect the sort order in "globs" and "mime.cache".

This assumes that implementations read globs linearly and stop at the first match.
But in KDE I parse this into a hash (that is globally shared among processes, via a file on disk)...
Hmm, OK, even then I could do this correctly by making it a multihash
with meaningful order in the list of values for a given key... This does get tricky,
but I see no other way to handle conflicting globs without magic indeed.

> > My problem is that I can't test the subclass case, README* is the only
> > case of a glob match that has a * but not as the first character, so
> > it's the only one that can give conflicts...
> > So after implementing "take longest match", I see no way of testing
> > "take subclass", since in the case of README.txt it is the longest
> > match anyway... I could can data, but I also
> > mean that we might not have a use case for it at the moment :)
> 
> I can't think of any case where its needed either, so maybe we should
> drop that to lower complexity.

Agreed.
However I just had a case where "take subclass" might be needed:
when the *magic* conflicts. Try "<!--foo--><html>bar</html>": this should
be detected as text/html, but it's detected as application/xml here because
both mimetypes have       <match value="&lt;!--" type="string" offset="0"/>
and they have the same priority for that magic rule! (50)
I believe this is a bug in freedesktop.org.xml, the rules for html should have higher
priority than the rules for xml, to fix this. But I guess we could also say
"the xml is fine, we just need to pick the subclass when conflicting magic rules
match". I wouldn't like that though, since it would make the implementation
more complex.

Do you agree with making the magic rules for xml priority 40?

> > OK. Anyone knows which other implementations of shared-mime-info
> > around?
> > xdgmime was mentionned, I don't know who knows it code well enough to
> > modify it once we all agree on the spec changes, but the first
> > question is: who else needs to approve those spec changes? Rox, I
> > assume? Thomas Leonard CC'ed (see thread
> > for more info, we covered "preferring globs over contents" before
> > talking about glob conflicts).
> 
> I can try to handle xdgmime. Its what is used in Gnome.

Excellent!
Please let me know how it goes, we seem to all agree now on the main changes.

-- 
David Faure, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).


More information about the xdg mailing list