[Bug 607619] [typefind] utf-16 text file mistakenly identified as layer 1 mpeg audio

GStreamer (bugzilla.gnome.org) bugzilla at gnome.org
Tue Sep 20 02:56:04 PDT 2011


https://bugzilla.gnome.org/show_bug.cgi?id=607619
  GStreamer | gst-plugins-base | unspecified

Tim-Philipp Müller <t.i.m> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #196978|none                        |reviewed
             status|                            |

--- Comment #4 from Tim-Philipp Müller <t.i.m at zen.co.uk> 2011-09-20 09:56:00 UTC ---
Review of attachment 196978:
 --> (https://bugzilla.gnome.org/review?bug=607619&attachment=196978)

::: gst/typefind/gsttypefindfunctions.c
@@ +212,3 @@
+   barely recognizing valid MP3 as MP3. If someone wants to decipher
+   the MP3 typefinder and improve it, it is over there |
+                                                       V */

What's the point of this mp3 typefinder rant?

Yes, the mp3 typefinder does go for audio/mpeg caps with a low probability (I
hope) if there's as little as a single sync code at the start. That's
intentional.

Do you have any evidence that it is "barely recognizing valid MP3 as MP3" in
general?

@@ +233,3 @@
+    {4, "\xff\xfe\x00\x00", "utf32le", 10},
+    {4, "\x00\x00\xfe\xff", "utf32be", 20}
+  };

No need for a typedef here, since it's not used anywhere else. Also, can avoid
relocations by doing:

static const struct {
  size_t bomlen;
  const char bom[4];
  const char encoding[8];
  ...
} tester[] = {
  ...
};

@@ +250,3 @@
+
+  /* find a large enough size that works */
+  while (len < (1 << 18)) {

Make a #define for that please, and write it in terms of kB or so, e.g. 8*1024
or whatever.

@@ +268,3 @@
+    converted =
+        g_convert ((const gchar *) data, len, "utf8", tester[n].encoding,
NULL,
+        NULL, &error);

Hrm, I wonder if there's something better we can do here that avoids allocs..
(let's keep in mind that we're going to be running this four times for every
blob of data that's not immediately identified with a 100% probability).

For UTF-32 that's probably not too hard, might need some more work for UTF-16
though given the API provided by glib (basically want something like
g_utf8_get_char_validated() but for UTF-16), I think.

@@ +279,3 @@
+  }
+
+  if (prob >= 0)

greater or *equal* 0 ?

@@ +4653,3 @@
+      vgm_exts, "Vgm\x20", 4, GST_TYPE_FIND_MAXIMUM);
+  TYPE_FIND_REGISTER_START_WITH (plugin, "audio/x-sap", GST_RANK_SECONDARY,
+      sap_exts, "SAP\x0d\x0a" "AUTHOR\x20", 12, GST_TYPE_FIND_MAXIMUM);

Can we do the gst-indent updates in a separate patch?

-- 
Configure bugmail: https://bugzilla.gnome.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.


More information about the gstreamer-bugs mailing list