Shared-mime checking order

Fri Feb 1 08:55:48 PST 2008

On Mon, 2007-09-17 at 22:16 +0200, David Faure wrote:
> On Tuesday 28 August 2007, Alexander Larsson wrote:
> > On Fri, 2007-08-24 at 16:44 +0200, Alexander Larsson wrote:
> > > This is my main problem with hi-priority sniffing. It either causes very
> > > bad performance behaviour in the file manager, or it adds user-visible
> > > confusion as to the type of some files.
> > > 
> > > I personally prefer to drop the hi-prio sniffing, and use sniffing only
> > > on conflicts and on extension match failure. This way you get only one,
> > > well defined, usable everywhere, canonical type (well, there is also the
> > > first-scan "fast mimetype", but you never open a file based on that). It
> > > also means that any user problem with file types is solvable by the user
> > > (just rename the problematic file).
> Agreed.
> 
> > Here is what I just implemented for gvfs:
> > 
> > If only one glob matches, use that
> > 
> > If no glob matches, sniff and use that
> > 
> > If several globs matches, and sniffing gives a result we do:
> >   if sniffed prio >= 80, use sniffed type
> >   for glob_match in glob_matches:
> >      if glob_match is subclass or equal to sniffed_type, use glob_match
> >
> > If several globs matches, and sniffing fails, or doesn't help:
> >   fall back to the first glob match 
> OK.
> 
> >   (maybe we should do something better here?)
> Can't think of any further heuristic, actually. Apart from using the "is text vs is binary"
> heuristic (already in the spec) to choose between a text-like and a binary-like format,
> but this is just one case.
> (and I thought we needed that until I realized that for the msword case we have
> x-ole-storage which is much better than just "is not text"; but maybe there's another
> case where we don't have a useful base mimetype).
> 
> > This algorithm only sniffs when there is some uncertainty with the
> > extension matching (thus, its usable for a file manager). 
> Yes.
> 
> And actually I think this is much more correct, not just faster.
> With the current spec, if I create a "infomation.txt" file that says
> "The tag to use for SMIL data is <smil>", then it would be detected as application/smil
> because of this high-priority magic rule:
>     <magic priority="80">
>       <match type="string" value="&lt;smil" offset="0:256"/>
>     </magic>
> We can't have that :) If it's information.txt then it's a text file for sure :)
> And if it has an unknown extension or none then we'll have a false positive here,
> but that's harder to fix. In any case this looks like a fragile rule for a 80-priority rule...

We should have a better magic instead, I guess. People get miffed off
when their files are being detected solely on the file type, eg. loading
a playlist from a crappy website that uses PHP, you end up with
a /tmp/file.php which is actually a playlist and should be handled as
such.