Shared-mime checking order

David Faure dfaure at trolltech.com
Mon Oct 15 12:42:04 PDT 2007


On Thursday 20 September 2007, Daniel Leidert wrote:
> Am Mittwoch, den 19.09.2007, 22:15 +0200 schrieb David Faure:
> > On Tuesday 18 September 2007, Daniel Leidert wrote:
> > > But I would like to raise a different issue: Let's take the above
> > > example and let's say, that I define a very specific MIME type 
> > > 
> > >   <mime-type type="text/x-mytest-one">
> > >     <comment>My test case one</comment>
> > >     <glob pattern="README.txt"/>
> > >     <sub-class-of type="text/plain"/>
> > >     <magic priority="100">
> > >       <match value="mystring" type="string" offset="0"/>
> > >     </magic>
> > >   </mime-type>
> > > 
> > > which MUST have the string "mystring" at offset 0.
> > "MUST" is wrong. magic is about having "hints" about what the file might be,
> > there is never a strong rule that the file MUST have this magic.
> 
> The libmagic implementation is based on the fact, that something MUST be
> found, to be of a special type. I know several file-types, that MUST
> have a special string/pattern. I don't understand your point.

  <mime-type type="text/html">
    <comment>HTML document</comment>
    <magic priority="50">
      <match value="&lt;!DOCTYPE HTML" type="string" offset="0:64"/>
      ...

All files starting with "<!DOCTYPE HTML" can be assumed to be text/html,
but the reverse is NOT true: all text/html files do not start with <!DOCTYPE HTML.
Or with <h1>, or with <html>, or with <script>, etc.

A magic rule is a hint, not a "MUST".
There are HTML files out there that do not match any of magic rules we have
for text/html.

Same thing for "a file that starts with /* is a text/x-csrc file", and yet
a file doesn't HAVE TO start with /* to be a C file, here again there is
even other possible magic for the same offset, so "MUST" is definitely wrong.

I do see your point though, it would be about adding a required="true" to <match>
in order to implement a new feature of "this magic is required", not about changing
the meaning of the existing <match> lines.
I would agree with this new feature provided that someone goes through the existing xml file
and adds the required attribute where it makes sense, however we should still do magic 
only when glob fails, not in all cases.

> $ touch example.nb
> $ gnomevfs-info -s example.nb 
> Name              : example.nb
> Type              : Regular
> MIME type         : application/mathematica
> [snip]
> 
> This is never-ever a Mathemica file. It's simply an empty text/plain file.
Sure. That's a user error though. If you don't want it to be detected as mathematica,
don't give it the extension .nb.
As Alex said (and I agree very much), the fact that we give priority to globs is a feature,
not a bug: it gives use control over the mimetype detection. Otherwise there would be
no way for the user to change the type of a file when it's wrongly detected (you and I
both know that magic detection is never 100% correct).

> I can prepare much more examples - you also can. I originally 
> discovered this issue when I added support for a file-type, that was
> using the .ent extension. Unfortunately .ent extensions are also often
> used for entity collections (instead of .dtd). Nautilus/gnome-vfs
> thought, that the entity collections files were of the chemical file
> type that used the .ent extension too. The problem always appears for
> file-types not found in the database, but using an extension found in
> the database (and of course being of the same generic type).
And the problem is solvable now, by associating the extension to both mimetypes.
*Then* magic detection will be done, to resolve that conflict; this is what this thread
has mostly been about :)

-- 
David Faure, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).


More information about the xdg mailing list