pitch element breaks lip sync in Chrome

Juan Navarro juan.navarro at gmx.es
Thu Sep 6 17:11:58 UTC 2018


It seems that the 'pitch' filter (in the soundtouch plugin, 
gst-plugins-bad) introduces some kind of wrong timestamp, clock skew, 
bad sequence, a combination of those, or maybe something different (but 
probably related).

I'm seeing an accumulative delay in Chrome between video and audio in a 
WebRTC call (_not_ using the new GStreamer's WebRTC element, yet) where 
the source is sending a video+audio stream, and the audio is filtered 
with the pitch element. Chrome is unable to perform the lip sync 
successfully, and for some reason deduces that somehow the audio is 
lagging behind the video (which is actually not), so it delays the video 
indefinitely until the delay gets to Chrome's maximum, 10 seconds.

The net effect of this issue is practically the same as what happened in 
this Chrome bug: 
https://bugs.chromium.org/p/webrtc/issues/detail?id=5456 (just check the 
with 'googCurrentDelayMs' and 'goodMinPlayoutDelayMs' growing linearly. 
At that time it happened to be Chrome wrongly using the webcam's 
timestamp, which had a different clock rate than the system's timestamp.

But, in this case I don't think it's a Chrome bug; I have verified that 
this is caused by the GStreamer's 'pitch' filter, by sourcing this 
simple test pipeline to my custom WebRTC source element:

... -> (raw audio)
   -> audioconvert -> audioresample -> pitch ->
   -> audioconvert -> audioresample -> WebRTC

(Probably most, if not all of those audioconvert/audioresample elements 
are not needed, I added them just to fall on the safe side)

This generates the mentioned delay in the video presentation handled by 

However nothing of this happens if the 'pitch' element is removed and 
any other is used, e.g. an 'scaletempo' element:

... -> (raw audio)
   -> audioconvert -> audioresample -> scaletempo ->
   -> audioconvert -> audioresample -> WebRTC

This produces a normal lip sync result in Chrome. Delay (latency) stays 
at around 100, 150 ms.

I've been reading google's WebRTC code, wanting to know exactly what is 
the name of the value that is to blame:

but finding out what is the correct function chain is difficult, and I'm 
still not sure of exactly *what* is making Chrome confused and wrongly 
assuming that the audio is behind the video, when it's not.

I have cherry-picked and applied all commits that touched the file 
'./ext/soundtouch/gstpitch.cc' into a custom built version of 
gst-plugins-bad, but the issue persists so it's not a matter of trying 
the latest code (after a lot of time without changes, the pitch filter 
received some patches in June so I wanted to test if those helped...)

Only idea I have is that the pitch element is missing some sequence 
number handling, or something about the pipeline's clock rate... but I'm 
out of ideas.

Please help :)

More information about the gstreamer-devel mailing list