[systemd-devel] [PATCH v2 1/2] utf8: intruduce utf8_escape_non_printable
Lennart Poettering
lennart at poettering.net
Thu Nov 6 17:34:40 PST 2014
On Mon, 03.11.14 15:00, WaLyong Cho (walyong.cho at samsung.com) wrote:
> ---
> src/shared/utf8.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> src/shared/utf8.h | 1 +
> src/test/test-utf8.c | 30 ++++++++++++++++++
> 3 files changed, 118 insertions(+)
>
> diff --git a/src/shared/utf8.c b/src/shared/utf8.c
> index 9353559..5245604 100644
> --- a/src/shared/utf8.c
> +++ b/src/shared/utf8.c
> @@ -210,6 +210,93 @@ char *utf8_escape_invalid(const char *str) {
> return p;
> }
>
> +char *utf8_escape_non_printable(const char *str) {
> + char *p, *s;
> +
> + assert(str);
> +
> + p = s = malloc(strlen(str) * 4 + 1);
> + if (!p)
> + return NULL;
> +
> + while (*str) {
> + int len;
> +
> + len = utf8_encoded_valid_unichar(str);
> + if (len > 0) {
> + if (utf8_is_printable(str, len)) {
> + s = mempcpy(s, str, len);
> + str += len;
> + } else {
> + switch (*str) {
> +
> + case '\a':
> + *(s++) = '\\';
> + *(s++) = 'a';
> + break;
> + case '\b':
> + *(s++) = '\\';
> + *(s++) = 'b';
> + break;
> + case '\f':
> + *(s++) = '\\';
> + *(s++) = 'f';
> + break;
> + case '\n':
> + *(s++) = '\\';
> + *(s++) = 'n';
> + break;
> + case '\r':
> + *(s++) = '\\';
> + *(s++) = 'r';
> + break;
> + case '\t':
> + *(s++) = '\\';
> + *(s++) = 't';
> + break;
> + case '\v':
> + *(s++) = '\\';
> + *(s++) = 'v';
> + break;
> + case '\\':
> + *(s++) = '\\';
> + *(s++) = '\\';
> + break;
> + case '"':
> + *(s++) = '\\';
> + *(s++) = '"';
> + break;
> + case '\'':
> + *(s++) = '\\';
> + *(s++) = '\'';
> + break;
> +
> + default:
> + /* For special chars we prefer octal over
> + * hexadecimal encoding, simply because glib's
> + * g_strescape() does the same */
> + if ((*str < ' ') || (*str >= 127)) {
> + *(s++) = '\\';
> + *(s++) = octchar((unsigned char) *str >> 6);
> + *(s++) = octchar((unsigned char) *str >> 3);
> + *(s++) = octchar((unsigned char) *str);
> + } else
> + *(s++) = *str;
> + break;
> + }
Hmm, do we really want the "C style" of escaping here? wouldn't be the
"\x style" of escaping more appropriate here?
If the "C style" of escaping is appropriate, then we should find a way
to unify this case block between cescape() and this call, i.e. split
it out in a new call, maybe called:
char* cescape_one(char c, char *buf);
That call would take the char to escape, plus a pointer to the buf
where to place the escaped version, and return a pointer that points
into the buffer right after where the escaped version was written.
That way cescape() and your new call could call it like this:
s = cescape_one(*str, s);
To escape one character. If you follow what I mean?
Lennart
--
Lennart Poettering, Red Hat
More information about the systemd-devel
mailing list