[RFC PATCH:app/xprop] Print UTF8_STRING type as UTF-8 when locale supports it

Yang Zhao yang at yangman.ca
Sun Oct 18 23:38:58 PDT 2009


2009/10/18 James Cloos <cloos at jhcloos.com>:
>>>>>> "Yang" == Yang Zhao <yang at yangman.ca> writes:
>
> Yang> Currently, when an invalid UTF-8 string is detected, an error message is printed
> Yang> instead of the string value.  I don't think this is ideal.  What would be better?
>
> You are correct that printing "<Not a valid UTF-8 string>" is not ideal.
>
> If the utf8 string is invalid, I'd print out the same output that the
> existing version of xprop(1) prints.  Or perhaps print out the valid
> parts with backslash-escaped data for the invalid parts.

I thought about doing the latter, but wanted to get some discussion
going first because it requires significantly more code. I avoided the
former because that behaviour seems non-intuitive ("why do some
strings print fine but others dont?").

After some thought, I think the best solution is to output an error
message, followed by the raw values of the string. ie:

  PROPERTY(UTF8_STRING) = <Not valid UTF-8: invalid value range> 0xc3,
0xff, 0xff, ...

This isn't much more work to add, and can be made to identify the type
of error, at least for the first invalid byte.


> Also, it would be good to add a comment explaining the logic used by the
> is_valid_utf8() function, notably including a specification of what it
> is verifying...

Will do.  The codepoint accumulator should also be larger, which I just noticed.


> We do need to make a policy decision on how strict the utf8 check
> should be...

Since all xprop does is to print the string in a human-friendly
format, simply being maximally strict is sane and avoids bringing in
additional complexity.  The user can already retrieve the raw string
value to find out what is wrong, so there is no loss of functionality.


-- 
Yang Zhao
http://yangman.ca


More information about the xorg-devel mailing list