Shared-mime checking order

David Faure dfaure at trolltech.com
Wed Aug 22 16:27:32 PDT 2007


On Friday 27 July 2007, Sanel Zukan wrote:
> Thank you for replies.
> 
> > Yeah, I also found that too, when checking my chemical MIME types list.
> > Seems, priorities of "50" are enough for magic patterns. Should the spec
> > be adjusted? What do you (people in general) think about this? I mean,
> > the spec was written to have a standardized way to handle things. That
> > doesn't mean, that things cannot be improved :) So is it time to update
> > the spec? I would really like to see GNOME and KDE [1] (and other like
> > rox-filer, ...) detecting the file types with the same success (of
> > course, there are some false positives with the way of GNOME's
> > implementation too - so there is place for improvement :)).
> 
> Yes; I'm also very interested to see unified detection, even if that
> detection for corner cases shows to be wrong.
> [...]
> > BCCing David Faure
> 
> Thanks; it would be really nice to see other implementations too :-)

The KDE 4 implementation follows the spec as much as possible, i.e. the algorithm (in KMimeType::findByUrl) is roughly
1) find from mode_t if set (leads to inode/*)
2) try high-priority (>80) magic rules for local files
3) try to find out by looking at the extension if any [except on protocols were extensions are unreliable like HTTP]
4) try low-priority magic rules for local files,
5) otherwise use protocol-based heuristics for some protocols (e.g. kde's "man:" is always HTML, or 
for protocols that allow listing directories like FTP or FISH, a url which ends with '/' is an inode/directory, etc.)

There's also a "fast mode" for that code to disable magic matching and only use 1), 3) and 5).

I haven't heard complaints about the actual mimetype detection logic yet, but kde4 isn't yet used
widely so I cannot actual user feedback yet ;)
But I definitely like that we are able to model "look for PDF magic before looking at the extension"
because this was missing in the kde3 mimetype system and forced us to implement some hacks
[we did the opposite solution, marking some known extensions as 'unreliable' and doing magic 
on those too, but that was a bit convoluted].

So, I like it as it is, at the moment.
The only thing I'm missing is a "native extension" for each mimetype, i.e. which extension to
suggest when saving with a given mimetype. I suppose I could pick the first one but order
doesn't matter currently, and also there's the case where we shouldn't mention extensions
for matching (see below). So I would like an explicit "preferred extension" for each mimetype
(but if there's exactly one glob then it can explicitely be parsed as preferred extension,
to avoid redundancy in the simple case).

Alexander Larsson wrote:
> Of course, we always need to sniff in a some cases anyway: 
> * multiple extensions match
I think the freedesktop xml file should make sure that this doesn't happen.
If an extension can be used for two kinds of files (e.g. *.rpm) then the rules shouldn't
mention the extension at all, or at least there should be high-priority magic rules to
detect such files by magic (but I think skipping the extensions and using low-prio magic
is a better idea, since it's more efficient in that it's less often done).
Otherwise if we just have extensions, it's just useless, we wouldn't know which mimetype to pick from the two.

> Gnome currently doesn't look at the priorities at all I believe.
Ouch. Is it planned to change that? Non-standard behavior defeats the purpose of a standard :)

-- 
David Faure, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).


More information about the xdg mailing list