[systemd-devel] [PATCH v2 1/2] utf8: intruduce utf8_escape_non_printable

Thu Nov 6 17:34:40 PST 2014

On Mon, 03.11.14 15:00, WaLyong Cho (walyong.cho at samsung.com) wrote:

> ---
>  src/shared/utf8.c    | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/shared/utf8.h    |  1 +
>  src/test/test-utf8.c | 30 ++++++++++++++++++
>  3 files changed, 118 insertions(+)
> 
> diff --git a/src/shared/utf8.c b/src/shared/utf8.c
> index 9353559..5245604 100644
> --- a/src/shared/utf8.c
> +++ b/src/shared/utf8.c
> @@ -210,6 +210,93 @@ char *utf8_escape_invalid(const char *str) {
>          return p;
>  }
>  
> +char *utf8_escape_non_printable(const char *str) {
> +        char *p, *s;
> +
> +        assert(str);
> +
> +        p = s = malloc(strlen(str) * 4 + 1);
> +        if (!p)
> +                return NULL;
> +
> +        while (*str) {
> +                int len;
> +
> +                len = utf8_encoded_valid_unichar(str);
> +                if (len > 0) {
> +                        if (utf8_is_printable(str, len)) {
> +                                s = mempcpy(s, str, len);
> +                                str += len;
> +                        } else {
> +                                switch (*str) {
> +
> +                                case '\a':
> +                                        *(s++) = '\\';
> +                                        *(s++) = 'a';
> +                                        break;
> +                                case '\b':
> +                                        *(s++) = '\\';
> +                                        *(s++) = 'b';
> +                                        break;
> +                                case '\f':
> +                                        *(s++) = '\\';
> +                                        *(s++) = 'f';
> +                                        break;
> +                                case '\n':
> +                                        *(s++) = '\\';
> +                                        *(s++) = 'n';
> +                                        break;
> +                                case '\r':
> +                                        *(s++) = '\\';
> +                                        *(s++) = 'r';
> +                                        break;
> +                                case '\t':
> +                                        *(s++) = '\\';
> +                                        *(s++) = 't';
> +                                        break;
> +                                case '\v':
> +                                        *(s++) = '\\';
> +                                        *(s++) = 'v';
> +                                        break;
> +                                case '\\':
> +                                        *(s++) = '\\';
> +                                        *(s++) = '\\';
> +                                        break;
> +                                case '"':
> +                                        *(s++) = '\\';
> +                                        *(s++) = '"';
> +                                        break;
> +                                case '\'':
> +                                        *(s++) = '\\';
> +                                        *(s++) = '\'';
> +                                        break;
> +
> +                                default:
> +                                        /* For special chars we prefer octal over
> +                                         * hexadecimal encoding, simply because glib's
> +                                         * g_strescape() does the same */
> +                                        if ((*str < ' ') || (*str >= 127)) {
> +                                                *(s++) = '\\';
> +                                                *(s++) = octchar((unsigned char) *str >> 6);
> +                                                *(s++) = octchar((unsigned char) *str >> 3);
> +                                                *(s++) = octchar((unsigned char) *str);
> +                                        } else
> +                                                *(s++) = *str;
> +                                        break;
> +                                }

Hmm, do we really want the "C style" of escaping here? wouldn't be the
"\x style" of escaping more appropriate here?

If the "C style" of escaping is appropriate, then we should find a way
to unify this case block between cescape() and this call, i.e. split
it out in a new call, maybe called:

   char* cescape_one(char c, char *buf);

That call would take the char to escape, plus a pointer to the buf
where to place the escaped version, and return a pointer that points
into the buffer right after where the escaped version was written. 

That way cescape() and your new call could call it like this:

     s = cescape_one(*str, s);

To escape one character. If you follow what I mean?

Lennart

-- 
Lennart Poettering, Red Hat