[PATCH] charset: fix mm_charset_get_encoded_len
Ben Chan
benchan at chromium.org
Fri Aug 11 20:27:13 UTC 2017
The while loop in mm_charset_get_encoded_len() iterates through each
valid UTF-8 encoded character in the given NULL-terminated UTF-8 string.
It uses g_utf8_find_next_char() to find the position of the next
character. In case, g_utf8_find_next_char() returns NULL, it tries to
find the end (i.e. the NULL character) of the string.
This patch fixes the following issues in the while loop:
1. The loop uses both 'next' and 'end' to track the position of the next
character in the string.
When g_utf8_find_next_char() returns a non-NULL value, 'next' is
essentially the same as 'end'.
When g_utf8_find_next_char() returns NULL, 'next' remains NULL while
'end' is adjusted to track the end of the string (but is done
incorrectly as described in #2). After the 'p = next' assignment, the
'while (*p)' check results in a NULL dereference. 'p' should thus be
set to 'end' instead of 'next'.
'next' is thus redundant and can be removed.
2. When g_utf8_find_next_char() returns NULL and 'end' is adjusted to
track the end of string, the 'while (*end++)' loop stops when finding
the NULL character, but 'end' is advanced past the NULL character.
After the 'p = end' assignment, the 'while (*p)' check results in a
NULL dereference.
'while (*++end)' should be used instead given that 'p' doesn't point
to a NULL character when 'end = p' happens. 'end' will be updated to
point to the NULL character. After the 'p = end' assignment (fixed in
#1), the 'while (*p)' check will properly stop the loop.
---
Hi Dan and Aleksander,
I was a bit confused by the purpose of 'next' and 'end', and the part where it
skips through the end of the string when it fails to find the next UTF-8
encoded character. The patch is based on what I guess the intention of the
code. Please let me know if I've missed anything.
Also, We should try enumerate some test cases for it. Maybe in a follow-up patch.
Thanks,
Ben
src/mm-charsets.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/mm-charsets.c b/src/mm-charsets.c
index 191789d7..7ba31aa5 100644
--- a/src/mm-charsets.c
+++ b/src/mm-charsets.c
@@ -595,7 +595,7 @@ mm_charset_get_encoded_len (const char *utf8,
MMModemCharset charset,
guint *out_unsupported)
{
- const char *p = utf8, *next;
+ const char *p = utf8;
guint len = 0, unsupported = 0;
SubsetEntry *e;
@@ -618,17 +618,17 @@ mm_charset_get_encoded_len (const char *utf8,
c = g_utf8_get_char_validated (p, -1);
g_return_val_if_fail (c != (gunichar) -1, 0);
- end = next = g_utf8_find_next_char (p, NULL);
+ end = g_utf8_find_next_char (p, NULL);
if (end == NULL) {
/* Find the end... */
end = p;
- while (*end++);
+ while (*++end);
}
if (!e->func (c, p, (end - p), &clen))
unsupported++;
len += clen;
- p = next;
+ p = end;
}
if (out_unsupported)
--
2.14.0.434.g98096fd7a8-goog
More information about the ModemManager-devel
mailing list