[gst-devel] [gst-cvs] gst-plugins-base: typefind: speed up mxf_type_find over 300 times for worst case scenarios

Edward Hervey bilboed at gmail.com
Thu Oct 22 09:49:50 CEST 2009

On Thu, 2009-10-22 at 07:13 +0200, Sebastian Dröge wrote:
> Am Mittwoch, den 21.10.2009, 12:05 -0700 schrieb Edward Hervey:
> > Module: gst-plugins-base
> > Branch: master
> > Commit: d48d47e68365990d7c66782225f7fddf7efde86e
> > URL:    http://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=d48d47e68365990d7c66782225f7fddf7efde86e
> > 
> > Author: Edward Hervey <bilboed at bilboed.com>
> > Date:   Wed Oct 21 20:44:33 2009 +0200
> > 
> > typefind: speed up mxf_type_find over 300 times for worst case scenarios
> > 
> > * memcmp is expensive and was being abused, reduce calling it by checking
> >   the first byte.
> I expected that the memcmp() would be inlined by the compiler because of
> the fixed length... and then it would be as fast as your change. Any
> idea why the compiler doesn't inline it here? :)

  I'm now totally confused, I just recompiled it and checked the asm...
and it does inline it with a "repz cmpsb" on x86 (which it was you'd

  The reason why the initial [i] checking speed things up by an extra
50% seems to be because "cmp" (of data[i] == 0x06) is faster than the
setup/usage of "rep[z] cmps[b]". That would correspond to reducing the
probability that "cmps" needs to be called by 1/255.

  Adding a manual check for the second byte speeds things up only a tiny
bit more (less than 1% speedup compared to checking the first byte). I
didn't put that for that reason.

  The biggest overhead is definitely calling the data_scan_ctx methods 1
byte at a time, even if they're inlined. The mp3 typefind function
handles that on its own , and results in a much faster typefinder,
despite its complexity.

  FWIW, the other expensive typefinders are:
  * mpeg_video_stream_type
  * mpeg_find_next_header
  * mpeg_sys_type_find
  * h264_video_type_find
  And most of the overhead in those is because of using the
data_scan_ctx methods over little number of bytes.

FYI, on most modern cpus, cmp is 7-10 times faster than cmps for one
invalid byte (yes, crazy) and the setup is also more expensive (you need
to setup 3 registers, whereas with cmp you're comparing a constant to a
memory region which is already loaded).

> But thanks for noticing and fixing this :)

More information about the gstreamer-devel mailing list