[PATCH] charset: fix mm_charset_get_encoded_len

Fri Aug 11 22:31:49 UTC 2017

On Fri, Aug 11, 2017 at 2:54 PM, Dan Williams <dcbw at redhat.com> wrote:

> On Fri, 2017-08-11 at 13:27 -0700, Ben Chan wrote:
> > The while loop in mm_charset_get_encoded_len() iterates through each
> > valid UTF-8 encoded character in the given NULL-terminated UTF-8
> > string.
> > It uses g_utf8_find_next_char() to find the position of the next
> > character. In case, g_utf8_find_next_char() returns NULL, it tries to
> > find the end (i.e. the NULL character) of the string.
> >
> > This patch fixes the following issues in the while loop:
> >
> > 1. The loop uses both 'next' and 'end' to track the position of the
> > next
> >    character in the string.
> >
> >    When g_utf8_find_next_char() returns a non-NULL value, 'next' is
> >    essentially the same as 'end'.
> >
> >    When g_utf8_find_next_char() returns NULL, 'next' remains NULL
> > while
> >    'end' is adjusted to track the end of the string (but is done
> >    incorrectly as described in #2). After the 'p = next' assignment,
> > the
> >    'while (*p)' check results in a NULL dereference. 'p' should thus
> > be
> >    set to 'end' instead of 'next'.
> >
> >    'next' is thus redundant and can be removed.
>
> Agreed.
>
> > 2. When g_utf8_find_next_char() returns NULL and 'end' is adjusted to
> >    track the end of string, the 'while (*end++)' loop stops when
> > finding
> >    the NULL character, but 'end' is advanced past the NULL character.
> >    After the 'p = end' assignment, the 'while (*p)' check results in
> > a
> >    NULL dereference.
>
> Yep.
>
> >    'while (*++end)' should be used instead given that 'p' doesn't
> > point
> >    to a NULL character when 'end = p' happens. 'end' will be updated
> > to
> >    point to the NULL character. After the 'p = end' assignment (fixed
> > in
> >    #1), the 'while (*p)' check will properly stop the loop.
> > ---
> > Hi Dan and Aleksander,
> >
> > I was a bit confused by the purpose of 'next' and 'end', and the part
> > where it
> > skips through the end of the string when it fails to find the next
> > UTF-8
> > encoded character. The patch is based on what I guess the intention
> > of the
> > code. Please let me know if I've missed anything.
>
> I think I wrote this code.  I'm now confused about the purpose of end
> and next.  You're not alone.
>
> I think we just get lucky here; perhaps there's always another byte at
> the end of the string that doesn't cause a segfault due to padding or
> memory allocation weirdness or something.
>
> > Also, We should try enumerate some test cases for it. Maybe in a
> > follow-up patch.
>
> I suspect we'd catch it under valgrind.  NetworkManager has valgrind
> test support, and while it's a bit painful sometimes (because we have
> to update the false-positive filter regularly due to glib/GObject
> changes) and it makes the tests go a lot slower, it definitely can
> catch this kind of thing.
>
>
Agreed. I seldom send/receive SMS, so probably haven't hit this code path
under Valgrind or AddressSanitizer. I've been experimenting with clang
static analyzer and got some useful results.... in case you and Aleksander
was wondering why I submitted all these random patches :)  There are some
false positives but I'm happy with the results so far.  Perhaps we could
incorporate Clang tool flow in the Jenkins setup.  The online Coverity scan
also looks interesting, but I haven't got a chance to look into that yet.

> One more comment below...
>
> > Thanks,
> > Ben
> >
> >  src/mm-charsets.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/mm-charsets.c b/src/mm-charsets.c
> > index 191789d7..7ba31aa5 100644
> > --- a/src/mm-charsets.c
> > +++ b/src/mm-charsets.c
> > @@ -595,7 +595,7 @@ mm_charset_get_encoded_len (const char *utf8,
> >                              MMModemCharset charset,
> >                              guint *out_unsupported)
> >  {
> > -    const char *p = utf8, *next;
> > +    const char *p = utf8;
> >      guint len = 0, unsupported = 0;
> >      SubsetEntry *e;
> >
> > @@ -618,17 +618,17 @@ mm_charset_get_encoded_len (const char *utf8,
> >
> >          c = g_utf8_get_char_validated (p, -1);
> >          g_return_val_if_fail (c != (gunichar) -1, 0);
> > -        end = next = g_utf8_find_next_char (p, NULL);
> > +        end = g_utf8_find_next_char (p, NULL);
> >          if (end == NULL) {
> >              /* Find the end... */
> >
> Could you update the comment to say "find the string-terminating NULL"
> or something like that?
>
>
Sure, updated in v2.

> >              end = p;
> > -            while (*end++);
> > +            while (*++end);
> >          }
> >
> >          if (!e->func (c, p, (end - p), &clen))
> >              unsupported++;
> >          len += clen;
> > -        p = next;
> > +        p = end;
> >      }
> >
> >      if (out_unsupported)
> _______________________________________________
> ModemManager-devel mailing list
> ModemManager-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/modemmanager-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/modemmanager-devel/attachments/20170811/db231235/attachment-0001.html>