[Bug 770355] id3v2: Fix parsing extended header and string lists

GStreamer (GNOME Bugzilla) bugzilla at gnome.org
Sat Dec 24 10:06:17 UTC 2016


https://bugzilla.gnome.org/show_bug.cgi?id=770355

Tim-Philipp Müller <t.i.m at zen.co.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #334107|none                        |needs-work
             status|                            |

--- Comment #9 from Tim-Philipp Müller <t.i.m at zen.co.uk> ---
Comment on attachment 334107
  --> https://bugzilla.gnome.org/attachment.cgi?id=334107
id3v2: Fix splitting strings

Thanks for the patch!

It mostly looks right, but there are some things I don't understand yet, and
the outcome is also not entirely correct yet.

>When parsing null-terminated strings, do not include the
>terminating null byte.  Depending on the encoding used, either
>g_utf8_validate() failed due to this, or worse the call to
>g_utf16_to_utf8() would return 0 items read on an empty (0 length)
>string

This makes sense to me.

> causing it to fail parsing certain tags

Frames you mean here?

> rendering the entire file unplayable.

OOC, why would the file be unplayable if the tag can't be parsed? I tried, and
GStreamer just skips the tag and continues playing the file with your sample
tag.


>--- a/gst-libs/gst/tag/id3v2frames.c
>+++ b/gst-libs/gst/tag/id3v2frames.c
>@@ -1103,9 +1104,20 @@ parse_insert_string_field (guint8 encoding, gchar * data, gint data_size,
>       /* Sometimes we see strings with multiple BOM markers at the start.
>        * In that case, we assume the innermost one is correct. If that fails
>        * to produce valid UTF-8, we try the other endianness anyway */
>-      while (data_size > 2 && find_utf16_bom (data, &data_endianness)) {
>+      while (data_size >= 2 && find_utf16_bom (data, &data_endianness)) {
>         data += 2;              /* skip BOM */
>         data_size -= 2;
>+        have_bom = TRUE;
>+      }
>+
>+      if (data_size < 2) {
>+        field = g_strdup ("");
>+        break;
>+      }
>+
>+      if (have_bom) {
>+        data -= 2;
>+        data_size += 2;
>       }

This last if (have_bom) +/- 2 bytes bit I don't understand. The BOM is already
skipped in the while() loop above, why are we skipping it here again?

Also, with this what happens with your sample tag is that we extract the
string, but we get a BOM converted to UTF-8 at the beginning of the
extracted/converted UTF-8 string, which isn't right.

I think this just skips the terminator after the first BOM and then includes
the BOM for the second string in the conversion so it mostly works by accident.

Without the if (have_bom) block everything works correctly.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.


More information about the gstreamer-bugs mailing list