Shared-mime checking order

David Faure dfaure at trolltech.com
Fri Aug 24 06:39:06 PDT 2007


On Friday 24 August 2007, Alexander Larsson wrote:
> On Thu, 2007-08-23 at 01:27 +0200, David Faure wrote:
> > The KDE 4 implementation follows the spec as much as possible, i.e. the algorithm (in KMimeType::findByUrl) is roughly
> > 1) find from mode_t if set (leads to inode/*)
> > 2) try high-priority (>80) magic rules for local files
> > 3) try to find out by looking at the extension if any [except on protocols were extensions are unreliable like HTTP]
> > 4) try low-priority magic rules for local files,
> > 5) otherwise use protocol-based heuristics for some protocols (e.g. kde's "man:" is always HTML, or 
> > for protocols that allow listing directories like FTP or FISH, a url which ends with '/' is an inode/directory, etc.)
> 
> Really? Since there are rules with priority > 80 this means you always
> have to load the first block of the file when detecting mimetype. This
> is awfully slow, since seek times on disks are bad, and are not getting
> any better.
Right.

> I don't think this is realistic for e.g. a file manager. Its just too slow. 
No, because we do the mimetype determination delayed in KDE, after showing 
the directory contents with fast mimetype determination (no magic).

> A solution that mainly looks at extensions, but that then tries to 
> sniff for "problematic" (an uknown/missing) extensions could work, but
> not sniffing all files.
Right, in the case of the file manager directory listing, we do not sniff all files...
Only those where the mimetype couldn't be found from the extension...
This makes me realize that this isn't 100% spec compliant indeed.

> > There's also a "fast mode" for that code to disable magic matching and only use 1), 3) and 5).
> 
> Thats useful, but how do you expose these different types to the user?
The fast mode is used for the initial directory listing, for instance.

> > So, I like it as it is, at the moment.
> > The only thing I'm missing is a "native extension" for each mimetype, i.e. which extension to
> > suggest when saving with a given mimetype. I suppose I could pick the first one but order
> > doesn't matter currently, and also there's the case where we shouldn't mention extensions
> > for matching (see below). So I would like an explicit "preferred extension" for each mimetype
> > (but if there's exactly one glob then it can explicitely be parsed as preferred extension,
> > to avoid redundancy in the simple case).
> 
> That would be nice.

Good to hear. Anyone has time to add this to the spec? (I definitely don't, until at least next year ;-)

> A typical example is *.pcf, which is both a font type and a cisco vpn
> description file. The later isn't currently in the xdg shared mime db,
> but we have a patch in fedora. I don't see how leaving the extension
> knowledge out of the db is any better than a conflict. It just means we
> know less, but we can still make a decision in the same way.
That's true. It just kills my nice binary search from the extension to the mimetype :-))
But you're right, we should rather implement proper conflict resolution than
hope for no conflicts like we did in kde3.

> I don't belive the efficiency argument here. An extension match is a
> pure cpu thing on the order of nanoseconds. A sniffing is an i/o thing
> on the order of milliseconds (or much more if your disks are busy),
> espcially with slow laptop drives. Thats many thousands of times slower.
Err, I totally agree, I think you misunderstood me, I didn't say that
conflicting extensions would be an efficiency problem, if handled
correctly..

> > > Gnome currently doesn't look at the priorities at all I believe.
> > Ouch. Is it planned to change that? Non-standard behavior defeats the purpose of a standard :)
> Well, we do look at it when sniffing (i.e. a higher prio magic matches
> before a lower prio), but not when deciding sniffing vs extension
> mapping. This is since we want to avoid the slow sniffing as much as
> possible. To do the comparison we'd have to sniff to see which magic
> rule (if any) matched.

I understand what you mean... I'm not sure what's the best behavior though;
your mail made me realize that I did skip "high-magic-beats-extension" in the
delayed-mimetype-determination code (even though the core mimetype code can do that),
and I re-enabling that would mean a lot of sniffing (every file needs to be rechecked
in the post-listing mimetype redetermination...)... Hmm.

-- 
David Faure, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).


More information about the xdg mailing list