[PATCH v2] charset: fix mm_charset_get_encoded_len
Aleksander Morgado
aleksander at aleksander.es
Tue Aug 15 08:02:39 UTC 2017
On 12/08/17 00:17, Ben Chan wrote:
> The while loop in mm_charset_get_encoded_len() iterates through each
> valid UTF-8 encoded character in the given NULL-terminated UTF-8 string.
> It uses g_utf8_find_next_char() to find the position of the next
> character. In case, g_utf8_find_next_char() returns NULL, it tries to
> find the end (i.e. the NULL character) of the string.
>
> This patch fixes the following issues in the while loop:
>
> 1. The loop uses both 'next' and 'end' to track the position of the next
> character in the string.
>
> When g_utf8_find_next_char() returns a non-NULL value, 'next' is
> essentially the same as 'end'.
>
> When g_utf8_find_next_char() returns NULL, 'next' remains NULL while
> 'end' is adjusted to track the end of the string (but is done
> incorrectly as described in #2). After the 'p = next' assignment, the
> 'while (*p)' check results in a NULL dereference. 'p' should thus be
> set to 'end' instead of 'next'.
>
> 'next' is thus redundant and can be removed.
>
> 2. When g_utf8_find_next_char() returns NULL and 'end' is adjusted to
> track the end of string, the 'while (*end++)' loop stops when finding
> the NULL character, but 'end' is advanced past the NULL character.
> After the 'p = end' assignment, the 'while (*p)' check results in a
> dereference of out-of-bounds pointer.
>
> 'while (*++end)' should be used instead given that 'p' doesn't point
> to a NULL character when 'end = p' happens. 'end' will be updated to
> point to the NULL character. After the 'p = end' assignment (fixed in
> #1), the 'while (*p)' check will properly stop the loop.
> ---
> v2: Updated a comment suggested by Dan
>
Pushed to git master. I'm sending a follow up commit with a related issue.
> src/mm-charsets.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/src/mm-charsets.c b/src/mm-charsets.c
> index 191789d7..fb47b0ae 100644
> --- a/src/mm-charsets.c
> +++ b/src/mm-charsets.c
> @@ -595,7 +595,7 @@ mm_charset_get_encoded_len (const char *utf8,
> MMModemCharset charset,
> guint *out_unsupported)
> {
> - const char *p = utf8, *next;
> + const char *p = utf8;
> guint len = 0, unsupported = 0;
> SubsetEntry *e;
>
> @@ -618,17 +618,17 @@ mm_charset_get_encoded_len (const char *utf8,
>
> c = g_utf8_get_char_validated (p, -1);
> g_return_val_if_fail (c != (gunichar) -1, 0);
> - end = next = g_utf8_find_next_char (p, NULL);
> + end = g_utf8_find_next_char (p, NULL);
> if (end == NULL) {
> - /* Find the end... */
> + /* Find the string terminating NULL */
> end = p;
> - while (*end++);
> + while (*++end);
> }
>
> if (!e->func (c, p, (end - p), &clen))
> unsupported++;
> len += clen;
> - p = next;
> + p = end;
> }
>
> if (out_unsupported)
>
--
Aleksander
https://aleksander.es
More information about the ModemManager-devel
mailing list