[patch] filter invalid utf-8 characters from volume labels

David Zeuthen david at fubar.dk
Thu Jul 29 09:38:23 PDT 2004


On Thu, 2004-07-29 at 11:05 -0400, Joe Shaw wrote:
> On Wed, 2004-07-28 at 23:26 +0200, Sjoerd Simons wrote:
> > This is one of those cases where i don't really care how it get's fixed as long
> > as it gets fixed :). Although i don't know if guessing that it's -15 when it's
> > not (or invalid) utf-8 is better then assuming there is garbage in an utf-8
> > string.
> 
> I guess that's true; this just isn't a win-win in any case.
> 
> > Any ideas how to solve it more generally ? It's possible some code into 
> > hal_device_set_property_string to ensure that the string value is always valid 
> > utf-8. But that doesn't feel right, on the other hand ``fixing'' every place
> > where hal sets a string property with information from the outside is a lot of
> > work and probably error-prone.
> 
> My only objection to validating on string set is that validating UTF-8
> is a very expensive operation.  There's no silver bullet unfortunately;
> the best we can probably do is validate in those fewer cases when we're
> reading data from an external source, and either treat it as invalid
> UTF-8 or try to convert it from some other character encoding.
> Unfortunately, I think it's a manual process.
> 

Yah we should verify as soon as possible and use what we know when
fetching from a source; I think we need to go through the entire
sourcecode to verify that it's UTF-8 everywhere (and being treated as
such, e.g. when we copy strings around etc.) and make reasonable guesses
when it's not specified from the source (e.g. ext3 labels etc.). It's
high on my hitlist to get this fixed.

Cheers,
David
_______________________________________________
hal mailing list
hal at freedesktop.org
http://freedesktop.org/mailman/listinfo/hal



More information about the Hal mailing list