[PATCH] charset: fix mm_charset_get_encoded_len

Fri Aug 11 21:54:39 UTC 2017

On Fri, 2017-08-11 at 13:27 -0700, Ben Chan wrote:
> The while loop in mm_charset_get_encoded_len() iterates through each
> valid UTF-8 encoded character in the given NULL-terminated UTF-8
> string.
> It uses g_utf8_find_next_char() to find the position of the next
> character. In case, g_utf8_find_next_char() returns NULL, it tries to
> find the end (i.e. the NULL character) of the string.
> 
> This patch fixes the following issues in the while loop:
> 
> 1. The loop uses both 'next' and 'end' to track the position of the
> next
>    character in the string.
> 
>    When g_utf8_find_next_char() returns a non-NULL value, 'next' is
>    essentially the same as 'end'.
> 
>    When g_utf8_find_next_char() returns NULL, 'next' remains NULL
> while
>    'end' is adjusted to track the end of the string (but is done
>    incorrectly as described in #2). After the 'p = next' assignment,
> the
>    'while (*p)' check results in a NULL dereference. 'p' should thus
> be
>    set to 'end' instead of 'next'.
> 
>    'next' is thus redundant and can be removed.

Agreed.

> 2. When g_utf8_find_next_char() returns NULL and 'end' is adjusted to
>    track the end of string, the 'while (*end++)' loop stops when
> finding
>    the NULL character, but 'end' is advanced past the NULL character.
>    After the 'p = end' assignment, the 'while (*p)' check results in
> a
>    NULL dereference.

Yep.

>    'while (*++end)' should be used instead given that 'p' doesn't
> point
>    to a NULL character when 'end = p' happens. 'end' will be updated
> to
>    point to the NULL character. After the 'p = end' assignment (fixed
> in
>    #1), the 'while (*p)' check will properly stop the loop.
> ---
> Hi Dan and Aleksander,
> 
> I was a bit confused by the purpose of 'next' and 'end', and the part
> where it
> skips through the end of the string when it fails to find the next
> UTF-8
> encoded character. The patch is based on what I guess the intention
> of the
> code. Please let me know if I've missed anything.

I think I wrote this code.  I'm now confused about the purpose of end
and next.  You're not alone.

I think we just get lucky here; perhaps there's always another byte at
the end of the string that doesn't cause a segfault due to padding or
memory allocation weirdness or something.

> Also, We should try enumerate some test cases for it. Maybe in a
> follow-up patch.

I suspect we'd catch it under valgrind.  NetworkManager has valgrind
test support, and while it's a bit painful sometimes (because we have
to update the false-positive filter regularly due to glib/GObject
changes) and it makes the tests go a lot slower, it definitely can
catch this kind of thing.

One more comment below...

> Thanks,
> Ben
> 
>  src/mm-charsets.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mm-charsets.c b/src/mm-charsets.c
> index 191789d7..7ba31aa5 100644
> --- a/src/mm-charsets.c
> +++ b/src/mm-charsets.c
> @@ -595,7 +595,7 @@ mm_charset_get_encoded_len (const char *utf8,
>                              MMModemCharset charset,
>                              guint *out_unsupported)
>  {
> -    const char *p = utf8, *next;
> +    const char *p = utf8;
>      guint len = 0, unsupported = 0;
>      SubsetEntry *e;
>  
> @@ -618,17 +618,17 @@ mm_charset_get_encoded_len (const char *utf8,
>  
>          c = g_utf8_get_char_validated (p, -1);
>          g_return_val_if_fail (c != (gunichar) -1, 0);
> -        end = next = g_utf8_find_next_char (p, NULL);
> +        end = g_utf8_find_next_char (p, NULL);
>          if (end == NULL) {
>              /* Find the end... */
> 
Could you update the comment to say "find the string-terminating NULL"
or something like that?

>              end = p;
> -            while (*end++);
> +            while (*++end);
>          }
>  
>          if (!e->func (c, p, (end - p), &clen))
>              unsupported++;
>          len += clen;
> -        p = next;
> +        p = end;
>      }
>  
>      if (out_unsupported)