Shared-mime checking order

David Faure dfaure at trolltech.com
Wed Sep 19 12:53:58 PDT 2007


On Tuesday 18 September 2007, Alexander Larsson wrote:
> On Tue, 2007-09-18 at 11:18 +0200, Patryk Zawadzki wrote:
> > On 9/18/07, Alexander Larsson <alexl at redhat.com> wrote:
> > > On Tue, 2007-09-18 at 00:51 +0200, David Faure wrote:
> > > > On Tuesday 28 August 2007, Alexander Larsson wrote:
> > > > > If several globs matches, and sniffing fails, or doesn't help:
> > > > >   fall back to the first glob match
> > > > >   (maybe we should do something better here?)
> > > >
> > > > Hmm, I just found the case of "README.txt", which could either be "text/plain" due to *.txt
> > > > or "text/x-readme" due to README*. Which one should we pick? The second pattern "looks"
> > > > more specific to my eyes so it should probably win, but how should we quantify that?
> > > > Should we take the longest pattern?
> > >
> > > Yeah, this is tricky. I think the longest pattern is the traditional way
> > > to solve things like that. It will probably work good enought for us.
> > 
> > Isn't just enough to check if either of them is the subclass of the
> > second? If so, pick the more specific one.
> 
> That only works in the case of subclasses though, which might not always
> be the case. Seems right to use that when its possible though.

Agreed. Do we also agree that this handling of multiple glob matches can be done right away
inside glob-matching? No need to delay that to the "If several globs matches" resolution
(after sniffing), IMHO.

So the new algorithm would be the one described with Alexander, with something like this prepended:
Glob-matching should prefer derived mimetype over base mimetype, and longer matches
over shorter ones. However if two globs of the same length match the file, and the two
matches are not related in the inheritance tree, then we have a "glob conflict", which
will be resolved below.
"If several globs matches" in Alexander's algorithm really becomes "In case of a glob conflict", 
i.e. two or more mimetypes with the same glob (like *.doc or *.ogg).

[Well technically you could invent a pattern like foo.* and *.doc, so that foo.doc matches both and
you don't have a "longer match", but this is really border case (and would simply be handled
as a "glob conflict" too).]

My problem is that I can't test the subclass case, README* is the only case of a
glob match that has a * but not as the first character, so it's the only one that can give conflicts...
So after implementing "take longest match", I see no way of testing "take subclass",
since in the case of README.txt it is the longest match anyway... I could can data, but I also
mean that we might not have a use case for it at the moment :)

==

OK. Anyone knows which other implementations of shared-mime-info around?
xdgmime was mentionned, I don't know who knows it code well enough to modify
it once we all agree on the spec changes, but the first question is: who else needs
to approve those spec changes? Rox, I assume? Thomas Leonard CC'ed (see thread
for more info, we covered "preferring globs over contents" before talking about glob conflicts).

Thanks for the feedback. Now the kde glob-matching code takes the longest match :)

-- 
David Faure, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).


More information about the xdg mailing list