Masking in the MIME magic spec

David Faure faure at kde.org
Fri Apr 19 09:42:55 PDT 2013


On Tuesday 19 March 2013 13:57:04 Thomas Kluyver wrote:
> On 19 March 2013 13:28, David Faure <faure at kde.org> wrote:
> > The other would be to write code that detects the cases where the database
> > has
> > values such that  (value & mask) != value, and fixing the database to
> > specify
> > (value & mask) as value from now on. This would allow implementations to
> > avoid
> > having to mask the value at runtime, which would lead to a minor speedup
> > (and
> > to the spec being correct after all).
> > Such code would be easy to write, as part of any of the existing
> > implementations, I would think.
> 
> Yes, I think that sounds reasonable, although of course implementations
> will need to support the existing data for some time, even if newer
> versions of shared-mime-info fix that. 

I don't see that point. I'm talking about fixing the shared-mime-info data to 
have more useful expected values, this won't break existing implementations at 
all.

What I meant by "Such code" and "the existing implementations" was to add a 
check in one implementation and use that to detect the weird expected values. 
But you've already done that apparently, by manual inspection.

You're right though, removing the masking of the value in the implementations 
cannot be done for quite some time, even if we adjust the data today.
Still, at some point this will be useful :)

> The downside is that
> update-mime-database is written in C, and as I found yesterday, I'm lousy
> at fixing C code. (Aside: this is an occasionally used script where
> performance isn't that important - would it make sense to write it in
> Python rather than C?)

Not my code, I can't comment on that. But IMHO let's not start a language 
flamewar. It's there and it works.

> I've just inspected the values I have. There aren't many rules using masks
> at all. Of those that are, 5 need the mask applied, in all cases because
> they use a placeholder character where the mask has a null byte.
> 
> - application/x-core, application/x-sharedlib and
> application/vnd.adobe.photoshop use spaces
> - image/bmp uses lowercase 'x'
> - application/vnd.corel-draw uses an uppercase 'X'

Ah, so this leads to more readable magic than using '\000' in the value field.
But indeed, update-mime-database could take care of sanitizing the value in 
the generated output.

OK, done for int values too, which caught one more case: 
  <mime-type type="image/x-sigma-x3f">
        <match value="0x00FF00FF" type="little32" offset="4" 
mask="0xFF00FF00"/>
I wonder if it's intended, i.e. the FF in the value field mean nothing...  
OK, http://www.photofo.com/downloads/x3f-raw-format.pdf says this is correct. 
The goal is to catch a version number like 0x00010003, for 1.3.

And done for strings too (I'm not C/glib programmer either, I'm rather a 
C++/Qt guy, so this should be reviewed by glib people) ;)

Attached is the diff (after hex-dumping) of the generated magic files.

-- 
David Faure, faure at kde.org, http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5
-------------- next part --------------
A non-text attachment was scrubbed...
Name: magic.diff
Type: text/x-patch
Size: 3761 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/xdg/attachments/20130419/38490ad0/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: update-mime-database.c.diff
Type: text/x-patch
Size: 1858 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/xdg/attachments/20130419/38490ad0/attachment-0001.bin>


More information about the xdg mailing list