Audio synchronization between individual RTP audio streams

Tim Müller tim at
Fri Nov 5 13:24:45 UTC 2021

Hi Vitaliy,

> We have a system with one RX pipeline that is running on RPi4 and
> listening to several UDP sources with different ports, and near 10
> individual TXs. All of the devices are located in the same room. TXs
> are sending their own RTP stream over Wi-Fi to RX simultaneously +
> RTCP with SR and SDES. And after some time due to network connection
> issues and other factors, these TX audio streams are running out of
> audio sync (from 100ms to 1s audio lag).
> We found a similar issue, and it says that if there is no RTCP
> connection between RX and TXs, there will be no synchronization. But
> this issue is about hosts that are not located in the same physical
> space. 
> Is RTCP usage can help to prevent audio lag between individual RTP
> streams (maybe feedback with RRs)? Can it be solved only using RX
> side with some manipulations with pipeline elements (rtpbin NTP/RTCP
> sync options), RTP timestamps, or something?
> Is this even possible to get a several milliseconds synchronization
> between many RTP audio streams, that are located physically in one
> room?

The general "problem" with RTP is that the timestamps in the RTP
packets are offset randomly without any absolute reference or base. A
receiver will typically just record the timestamp on the first packet
and map that to some local 0 base time and then interpolate from that.

If a sender sends audio and video or multiple audio streams, that might
mean that the receiving streams could initially be out of sync.

Sender report (SR) RTCP packets provide extra information from the
sender to receivers. They contain mappings between RTP timestamps to an
"ntp time". This then provides a receiver with a common time base for
all streams coming from a sender, so it can use that to offset
audio/video streams accordingly and achieve lipsync. A receiver doesn't
necessarily know what these ntp timestamps refer to though, so it can
only use it to sync the incoming streams relatively, but not to map it
to an absolute or local time or clock.

Now if you have multiple senders there's another problem: Different
machines will be using different clocks, and clocks drift over time.

So what you want to do is ideally make all devices (or at least all
senders) use a common clock. This can be an NTP clock or a PTP clock or
a GstNetClock tracking a local GstNetTimeProvider.

Once you've done that you make the senders use that clock for sender
report ntp time (for bonus points configure sender to use capture time
instead of send time for SRs, but depends a bit on your hardware and
drivers how well that will work).

Then all senders will basically be using the same clock/time as
reference for their ntp time stamps, and the receiver can correlate the
streams from the different senders.

There are also RFCs for RTP header extensions that allow senders to put
ntp timestamps into each packet they send out which allows for rapid
synchronisation (no need to wait for a Sender Report), but you still
need for senders to agree on a clock/time reference in order fo this to
be useful.

I believe if you configure the AVPF rtp profile it will send a SR
immediately at the beginning instead of only sending it after a few

Good luck!

More information about the gstreamer-devel mailing list