[systemd-devel] [PATCH v2 1/2] utf8: intruduce utf8_escape_non_printable

Wed Dec 3 17:38:00 PST 2014

On Wed, 19.11.14 12:35, David Herrmann (dh.herrmann at gmail.com) wrote:

> > +                        } else {
> > +                                if ((*str < ' ') || (*str >= 127)) {
> > +                                        *(s++) = '\\';
> > +                                        *(s++) = 'x';
> > +                                        *(s++) = hexchar((int) *str >> 4);
> > +                                        *(s++) = hexchar((int) *str);
> > +                                } else
> > +                                        *(s++) = *str;
> > +
> > +                                str += 1;
> 
> This part is wrong. You cannot rely on ``*str'' to be the correct
> Unicode value for the character. utf8_is_printable() returns false
> also for multi-byte UTF8 characters. By taking it unmodified, it will
> include the UTF8 management bits, which we really don't want here.
> 
> If you really want this, I'd prefer if you decode each UTF8 character,
> and if it is non-printable you print "\uABCD" or "\UABCDWXYZ" (like
> C++ does) as a 6-byte or 10-byte sequence. Other characters are just
> printed normally.

I have now committed the proposed patch but then changed the code to
iterate through all bytes of the unichar and escape that
individually. This form of escaping should be safe and be compatible
with C-style escaping (which \u isn't really...). Hope this makes sense.

Lennart

-- 
Lennart Poettering, Red Hat