synchronize two live sources with an offset

Mon Sep 21 22:01:42 UTC 2020

> Le mercredi 16 septembre 2020 à 19:32 -0400, Stepan Salenikovich a
> écrit :
>> Hi,
>> I'm trying to understand what is the correct way to synchronize two
>> live sources when one of them may (or may not) start with an offset.
>> In my specific case, audio and video is being captured from one
>> device. However the initial video frame might not always be output
>> immediately; it is only created when something changes on the device,
>> so its possible to start receiving audio before video.
>> 
>> My pipeline currently looks something like this:
>> 
>> appsrc is-live=true do-timestamp=true block=true \
>> ! h264parse disable-passthrough=true config-interval=-1 \
>> ! queue \
>> ! mp4mux name=mux max-raw-audio-drift=50000000000 interleave-
>> time=50000000000 faststart=true fragment-duration=100 \
>> ! appsink wait-on-eos=true \
>> alsasrc device=<device> ! audio/x-raw,channels=2 |
>> ! queue ! audioconvert ! audioresample ! audiorate
>> tolerance=500000000 \
>> ! fdkaacenc perfect-timestamp=true ! audio/mpeg,mpegversion=4 \
>> ! mux.audio_1
> 
> At first sight, all timestamp should be on the same timescale, so on
> sync.
> 
>> 
>> When the audio and video both come in at the same time, they are
>> synced. But when the video starts with a delay w.r.t. the audio, then
>> the resulting mp4 seems to have that delay as the offset between the
>> audio and the video; ie: it will play as if the video was supposed to
>> start at the same time as the audio.
> 
> Now, this use case is not very supported by ISOMP4 format. Depending on
> the player (and browsers are the worst), the handling of the EDL item
> that is used by gstreamer to fill the gap may be broken. Back few years
> ago, when we decided to support gaps like this, we could out that
> despite the file being valid, only gstreamer and quicktime could play
> it back correctly.
> 
> This is of course a guess, inspecting the resulting mp4 would give us a
> bit more insight. If this is fine for you use case, I would suggest
> pushing an initial image at time 0, this item will be displayed until
> the first video frame arrives, avoiding the initial gap. For extra
> precision, I would move away from do-timestamp, and set the timestamp
> myself (0 for the first one, and then using the clock_time - base_time
> for the remaining video frames).
> 

Oups, missed the initial reply because I wasn't subscribed to the 
mailing list. Thank you.

Is there an easy way to drop all audio data before the first video frame 
is received? One thing which is working better is by limiting the size 
of the audio queue and making it leaky, eg:
queue max-size-time=200000000 leaky=2

But it can't be made too short in order to not drop data during encoding 
later.