Case insensitive mimetype matching edge case

Alexander Larsson alexl at redhat.com
Wed Aug 19 00:22:10 PDT 2009


On Tue, 2009-08-18 at 19:42 +0200, David Faure wrote:
> On Tuesday 18 August 2009, Bastien Nocera wrote:
> > On Tue, 2009-08-18 at 15:25 +0200, David Faure wrote:
> > > On Wednesday 05 August 2009, David Faure wrote:
> > > > Seems to me that we should instead introduce an attribute for
> > > > case-sensitivity: <glob pattern="*.C" case-sensitive="true"/>
> > > > and do everything else case-insensitively.
> > >
> > > This would also fix bug
> > > https://bugs.freedesktop.org/show_bug.cgi?id=22634 because we could say
> > > that
> > >      <glob pattern="core"/>
> > > should be case-sensitive="true" as well, so that it doesn't match files
> > > named "Core", which are definitely no core dumps.
> > >
> > > Any reason against case-sensitive="true"?
> >
> > Probably not, except status quo. Feel free to fix.
> 
> OK, I'm starting to work on this.
> 
> One problem is again the generated globs2 file. It has to contain this
> attribute too, in some form, but any change in the format of this file breaks 
> existing parsers.
> So either we need a globs3 file (*), or we need a hack like using comments:
> # case-sensitive=true
> 50:text/x-c++src:*.C
> 
> (*) if we go for a globs3 file instead, adding another 26K of bloat even on 
> small devices :-), I suggest we make this more extensible for future changes 
> so that we don't need a globs4... We could specify that parsers should ignore 
> everything after the last ":" they know about. That is, they should be ready 
> for
> 50:text/x-c++src:*.C:[anything here]
> where they expect
> 50:text/x-c++src:*.C
> 
> I guess for now the globs3 format would be
> 50:text/x-c++src:*.C:cs
> (cs == case sensitive) and
> 50:text/x-csrc:*.c
> 
> but which could later be extended to
> 50:text/x-c++src:*.C:cs:[future extension]
> 50:text/x-csrc:*.c::[future extension]
> (note the empty "cs" field)
> 
> What do you think? Extensible format in globs3, comment-based hack
> (probably a bit ugly to implement), or do you have another idea?

Ugh. Additionally we have to extend the mime.cache format more. Maybe we
can solve this with a hack. What about this: 

All case insensitive globs are converted to lower case in the globs
file. Glob lookup is done by first matching the real filename against
the globs, then (on failure) convert the name to lower case and try
again. This will result in a case insensitive match except for things
marked as case sensitive that has at least one uppercase character.

We can't do case-sensitive matching of only-lowercase globs, but we
don't currently have any example of this in the databases.




More information about the xdg mailing list