Case insensitive mimetype matching edge case
faure at kde.org
Fri Aug 21 02:59:43 PDT 2009
On Friday 21 August 2009, Alexander Larsson wrote:
> On Wed, 2009-08-19 at 21:53 +0200, David Faure wrote:
> > On Wednesday 19 August 2009, Alexander Larsson wrote:
> > > On Wed, 2009-08-19 at 10:02 +0200, David Faure wrote:
> > > > On Wednesday 19 August 2009, Alexander Larsson wrote:
> > > > > Ugh. Additionally we have to extend the mime.cache format more.
> > > > > Maybe we can solve this with a hack. What about this:
> > > > >
> > > > > All case insensitive globs are converted to lower case in the globs
> > > > > file. Glob lookup is done by first matching the real filename
> > > > > against the globs, then (on failure) convert the name to lower case
> > > > > and try again. This will result in a case insensitive match except
> > > > > for things marked as case sensitive that has at least one uppercase
> > > > > character.
> > > > >
> > > > > We can't do case-sensitive matching of only-lowercase globs, but we
> > > > > don't currently have any example of this in the databases.
> > > >
> > > > But I do want to do one of those, to solve bug 22634: I want
> > > > <glob pattern="core"/> to be case-sensitive="true".
> > > >
> > > > How about a different hack:
> > > > we generate in globs2 two lines, in case of case-sensitive:
> > > > 50:text/x-c++src:*.C
> > > > 50:text/x-c++src:*.C:cs
> > > > Old parsers will create an entry for "*.C:cs", which will probably
> > > > never match any real file, so no big deal, while new parsers will
> > > > take the second line as an indication that the *.C glob (parsed one
> > > > line above) should be understood to be case sensitive.
> > >
> > > Hmmm. I like this one. Sounds good to me. But lets make it extensible
> > > when we're doing it, i.e. have a comma-separated list of flags with
> > > "cs" being one known one. Unknown flags are ignored, anything after
> > > another : is ignored.
> > Good idea.
> > I made the changes in the spec, in the definition of the two mimetypes,
> > and in update-mime-database.c (for parsing, and globs2 generation).
> > Please find patch attached (I can commit if you're ok with it).
> We must also mention that if a case sensitive match matches that has
> priority over the case insensitive match, otherwise the *.c vs *.C match
> will not work.
I think it's simpler to make *.c case-sensitive as well, so that we don't need
to care about priorities. That's what I did in my most recent patch sent to
this list and in my implementation, works fine.
> > I included a suggested format change for the mimeinfo.cache file, but
> > I'll have to let you implement that part, I don't know all the details
> > about the suffix tree etc. Same for the xdgmime implementation.
> I don't think the mimeinfo.cache changes are quite right. The literals
> are stored sorted and looked up with a binary search. We can't apply the
> flag once we've found the match as we won't match without the flag.
> Rather we have to add an additional CaseInsensitiveLiteralList. Also,
> we'd have to specify that the elements in this list is stored in lower
> case, sorted, so that case insensitive bsearch works.
OK.... this is exactly why I didn't touch mimeinfo.cache, I don't know enough
about it. Can I let you do the changes there?
> This has the same issue as the literals, so we have to make there be two
> trees, one case sensitive and one case insensitive.
Well there are only three case-sensitive globs right now, you could also just
go for a linear list for those ;-)
David Faure, faure at kde.org, sponsored by Qt Software @ Nokia to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
More information about the xdg