Masking in the MIME magic spec
Thomas Kluyver
thomas at kluyver.me.uk
Sun Apr 21 04:43:41 PDT 2013
Thanks, David, that all makes sense of me. I'll ensure that the next
version of PyXDG applies the mask to both values, until we can be confident
that it will see the updated version of the magic database.
Thomas
On 19 April 2013 17:42, David Faure <faure at kde.org> wrote:
> On Tuesday 19 March 2013 13:57:04 Thomas Kluyver wrote:
> > On 19 March 2013 13:28, David Faure <faure at kde.org> wrote:
> > > The other would be to write code that detects the cases where the
> database
> > > has
> > > values such that (value & mask) != value, and fixing the database to
> > > specify
> > > (value & mask) as value from now on. This would allow implementations
> to
> > > avoid
> > > having to mask the value at runtime, which would lead to a minor
> speedup
> > > (and
> > > to the spec being correct after all).
> > > Such code would be easy to write, as part of any of the existing
> > > implementations, I would think.
> >
> > Yes, I think that sounds reasonable, although of course implementations
> > will need to support the existing data for some time, even if newer
> > versions of shared-mime-info fix that.
>
> I don't see that point. I'm talking about fixing the shared-mime-info data
> to
> have more useful expected values, this won't break existing
> implementations at
> all.
>
> What I meant by "Such code" and "the existing implementations" was to add a
> check in one implementation and use that to detect the weird expected
> values.
> But you've already done that apparently, by manual inspection.
>
> You're right though, removing the masking of the value in the
> implementations
> cannot be done for quite some time, even if we adjust the data today.
> Still, at some point this will be useful :)
>
> > The downside is that
> > update-mime-database is written in C, and as I found yesterday, I'm lousy
> > at fixing C code. (Aside: this is an occasionally used script where
> > performance isn't that important - would it make sense to write it in
> > Python rather than C?)
>
> Not my code, I can't comment on that. But IMHO let's not start a language
> flamewar. It's there and it works.
>
> > I've just inspected the values I have. There aren't many rules using
> masks
> > at all. Of those that are, 5 need the mask applied, in all cases because
> > they use a placeholder character where the mask has a null byte.
> >
> > - application/x-core, application/x-sharedlib and
> > application/vnd.adobe.photoshop use spaces
> > - image/bmp uses lowercase 'x'
> > - application/vnd.corel-draw uses an uppercase 'X'
>
> Ah, so this leads to more readable magic than using '\000' in the value
> field.
> But indeed, update-mime-database could take care of sanitizing the value in
> the generated output.
>
> OK, done for int values too, which caught one more case:
> <mime-type type="image/x-sigma-x3f">
> <match value="0x00FF00FF" type="little32" offset="4"
> mask="0xFF00FF00"/>
> I wonder if it's intended, i.e. the FF in the value field mean nothing...
> OK, http://www.photofo.com/downloads/x3f-raw-format.pdf says this is
> correct.
> The goal is to catch a version number like 0x00010003, for 1.3.
>
> And done for strings too (I'm not C/glib programmer either, I'm rather a
> C++/Qt guy, so this should be reviewed by glib people) ;)
>
> Attached is the diff (after hex-dumping) of the generated magic files.
>
> --
> David Faure, faure at kde.org, http://www.davidfaure.fr
> Working on KDE, in particular KDE Frameworks 5
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/xdg/attachments/20130421/c899d6a3/attachment.html>
More information about the xdg
mailing list