Case insensitive mimetype matching edge case

David Faure faure at kde.org
Wed Aug 19 13:56:36 PDT 2009


On Wednesday 19 August 2009, Bastien Nocera wrote:
> On Wed, 2009-08-19 at 21:53 +0200, David Faure wrote:
> > On Wednesday 19 August 2009, Alexander Larsson wrote:
> > > On Wed, 2009-08-19 at 10:02 +0200, David Faure wrote:
> > > > On Wednesday 19 August 2009, Alexander Larsson wrote:
> > > > > Ugh. Additionally we have to extend the mime.cache format more.
> > > > > Maybe we can solve this with a hack. What about this:
> > > > >
> > > > > All case insensitive globs are converted to lower case in the globs
> > > > > file. Glob lookup is done by first matching the real filename
> > > > > against the globs, then (on failure) convert the name to lower case
> > > > > and try again. This will result in a case insensitive match except
> > > > > for things marked as case sensitive that has at least one uppercase
> > > > > character.
> > > > >
> > > > > We can't do case-sensitive matching of only-lowercase globs, but we
> > > > > don't currently have any example of this in the databases.
> > > >
> > > > But I do want to do one of those, to solve bug 22634: I want
> > > >    <glob pattern="core"/> to be case-sensitive="true".
> > > >
> > > > How about a different hack:
> > > > we generate in globs2 two lines, in case of case-sensitive:
> > > > 50:text/x-c++src:*.C
> > > > 50:text/x-c++src:*.C:cs
> > > > Old parsers will create an entry for "*.C:cs", which will probably
> > > > never match any real file, so no big deal, while new parsers will
> > > > take the second line as an indication that the *.C glob (parsed one
> > > > line above) should be understood to be case sensitive.
> > >
> > > Hmmm. I like this one. Sounds good to me. But lets make it extensible
> > > when we're doing it, i.e. have  a comma-separated list of flags with
> > > "cs" being one known one. Unknown flags are ignored, anything after
> > > another : is ignored.
> >
> > Good idea.
> > I made the changes in the spec, in the definition of the two mimetypes,
> > and in update-mime-database.c (for parsing, and globs2 generation).
> > Please find patch attached (I can commit if you're ok with it).
> >
> > I included a suggested format change for the mimeinfo.cache file, but
> > I'll have to let you implement that part, I don't know all the details
> > about the suffix tree etc. Same for the xdgmime implementation.
> >
> > I like that this is going to improve performance, too: no need to do the
> > two- step glob matching anymore (case insensitive + case sensitive), it
> > will now be one -or- the other, for a given glob.
>
> Do you have the equivalent xdgmime changes as well?

No, my C hacking is not that good (and I thought it was dependent on the 
mimeinfo.cache change, but now I see it probably is not).

Can I let you do it? It should be "as simple as" parsing the fields and 
splitting at commas, and and replacing the two-step glob matching with
a single-step glob matching (either case sensitive or case insensitive).

I'm doing exactly that in the KDE implementation as we speak.

-- 
David Faure, faure at kde.org, sponsored by Qt Software @ Nokia to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).


More information about the xdg mailing list