[patch] filter invalid utf-8 characters from volume labels

Joe Shaw joeshaw at novell.com
Fri Jul 30 09:50:55 PDT 2004


On Wed, 2004-07-28 at 23:26 +0200, Sjoerd Simons wrote:
> This is one of those cases where i don't really care how it get's fixed as long
> as it gets fixed :). Although i don't know if guessing that it's -15 when it's
> not (or invalid) utf-8 is better then assuming there is garbage in an utf-8
> string.
>
> The problem seems very general to hal atm though. One debian user apperently
> has a MS usb mouse which has a 0xAE char (latin1 copyright sign) in it's
> description[0]. 
> 
> Any ideas how to solve it more generally ? It's possible some code into 
> hal_device_set_property_string to ensure that the string value is always valid 
> utf-8. But that doesn't feel right, on the other hand ``fixing'' every place
> where hal sets a string property with information from the outside is a lot of
> work and probably error-prone.

Okay, I've pulled a patented Joe Shaw flip-flop, and have committed a
patch which validates string properties as UTF-8 and replaces invalid
sequences with question marks like in your patch.  I could run for
public office with a record like this.

hald will now warn if you pass in invalid UTF-8, which will make it
easier for us to find the entry points where the data should be
sanitized.  It does make property sets slower, but I haven't noticed a
difference and it's ripe for optimization later when we're happy with
the UTF-8 cleanliness.

Thanks,
Joe

_______________________________________________
hal mailing list
hal at freedesktop.org
http://freedesktop.org/mailman/listinfo/hal



More information about the Hal mailing list