Shared-mime checking order

David Faure dfaure at trolltech.com
Mon Sep 17 13:16:16 PDT 2007


On Tuesday 28 August 2007, Alexander Larsson wrote:
> On Fri, 2007-08-24 at 16:44 +0200, Alexander Larsson wrote:
> > This is my main problem with hi-priority sniffing. It either causes very
> > bad performance behaviour in the file manager, or it adds user-visible
> > confusion as to the type of some files.
> > 
> > I personally prefer to drop the hi-prio sniffing, and use sniffing only
> > on conflicts and on extension match failure. This way you get only one,
> > well defined, usable everywhere, canonical type (well, there is also the
> > first-scan "fast mimetype", but you never open a file based on that). It
> > also means that any user problem with file types is solvable by the user
> > (just rename the problematic file).
Agreed.

> Here is what I just implemented for gvfs:
> 
> If only one glob matches, use that
> 
> If no glob matches, sniff and use that
> 
> If several globs matches, and sniffing gives a result we do:
>   if sniffed prio >= 80, use sniffed type
>   for glob_match in glob_matches:
>      if glob_match is subclass or equal to sniffed_type, use glob_match
>
> If several globs matches, and sniffing fails, or doesn't help:
>   fall back to the first glob match 
OK.

>   (maybe we should do something better here?)
Can't think of any further heuristic, actually. Apart from using the "is text vs is binary"
heuristic (already in the spec) to choose between a text-like and a binary-like format,
but this is just one case.
(and I thought we needed that until I realized that for the msword case we have
x-ole-storage which is much better than just "is not text"; but maybe there's another
case where we don't have a useful base mimetype).

> This algorithm only sniffs when there is some uncertainty with the
> extension matching (thus, its usable for a file manager). 
Yes.

And actually I think this is much more correct, not just faster.
With the current spec, if I create a "infomation.txt" file that says
"The tag to use for SMIL data is <smil>", then it would be detected as application/smil
because of this high-priority magic rule:
    <magic priority="80">
      <match type="string" value="&lt;smil" offset="0:256"/>
    </magic>
We can't have that :) If it's information.txt then it's a text file for sure :)
And if it has an unknown extension or none then we'll have a false positive here,
but that's harder to fix. In any case this looks like a fragile rule for a 80-priority rule...

> Do you think this would be ok for kde to use too? Its quite similar to
> what you said the kde4 file manager does already.
It wasn't exactly, but now it is. I finished making all the changes today.

Any other implementor around, who might have input on this before we proceed
to change the spec for good?

-- 
David Faure, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).


More information about the xdg mailing list