[patch] filter invalid utf-8 characters from volume labels

Kay Sievers kay.sievers at vrfy.org
Thu Aug 5 15:19:28 PDT 2004


On Thu, 2004-08-05 at 22:24 +0200, David Zeuthen wrote:
> On Thu, 2004-08-05 at 15:54 -0400, Joe Shaw wrote:
> > On Thu, 2004-08-05 at 21:46 +0200, Kay Sievers wrote:
> > > Do I need a licence now? :)
> > 
> > Yes.  You can send your check to the following address...
> > 
> > > Today I've found my first user of this feature. I just got a DVD with a
> > > 'ä' character, stupid encoded as ISO8859-1 (in 16Bit values):
> > > 
> > >   http://vrfy.org/projects/hal/invalid-unicode.png
> > 
> > Not bad, although it'd be nice if it actually converted it.  But like
> > Sjoerd said, it's basically a guess.
> > 
> > How do others feel about an encoding setting that we always try to
> > convert from?  It'd probably be Latin-15 for the majority of distros out
> > there, but people could set it to whatever was most appropriate for
> > them.  Hrmmmm.
> > 
> 
> Sure, when dealing with stuff where we can't determine the encoding it's
> sane to fallback to something. However, specifically for UDF
> filesystems, we may be in luck as the spec
> 
>  http://www.osta.org/specs/pdf/udf201.pdf
> 
> talks about that the disc stores how things are encoded. I got the link
> from this mail

Yes, but as far as I understand it (volume_id is based on this
document), is it not allowed to use anything else than 8 or 16 Bit
unicode for the volume label (it is a OSTA CS0 String, which is unicode
only).
On my DVD with the "ISO8859-1 'ä', the encoding is explicitely specified
as "Compressed Unicode". So we can guess only, if we find invalid
unicode.

Kay

_______________________________________________
hal mailing list
hal at freedesktop.org
http://freedesktop.org/mailman/listinfo/hal



More information about the Hal mailing list