Instant Replay with Gstreamer

Tue Jan 7 14:55:27 PST 2014

Sebastian helped me through this on IRC earlier today. Here's the gist for
anyone who might be following along from home:

 * when the pipeline is reconfigured, the source initiates caps negotiation
 * part of this negotiation involves sending an allocation query
 * the query is serialized, meaning it travels through the pipeline in the
same stream as buffers
 * the source that initiated the query blocks until it receives a response
to its allocation query, but since my app blocks buffers before the query
reaches its destination (the sink?), the source blocks indefinitely

Sebastian's idea was to use a probe to catch and drop the allocation query.
I found that this worked perfectly, but was surprised by the necessary
location of the probe: adding the probe to drop the allocation query on the
video source's srcpad prevented the videosource itself from blocking, but
the video *encoder* just downstream of it (with a queue between, of course)
continued to block. I found that if I moved the probe to the video
encoder's source that both elements continue to run fine through the caps
negotiation process even when the pipeline is blocked downstream.

I don't really understand the relationship between the caps query and the
allocation query, but I'm guessing that the video encoder is generating an
allocation query in response to having received the caps query. So the
important part is to catch and drop the allocation query as far downstream
as possible, but upstream of the place where the pipeline is blocked.

-Todd

On Tue, Jan 7, 2014 at 2:17 PM, Todd Agulnick <todd at agulnick.com> wrote:

> I'm getting a bit closer; still hoping that something I'm seeing here
> makes sense to somebody else.
>
> I've got data flowing through my pipeline up to a blocked srcpad on a
> leaky queue. Then a request comes in and I:
>
>  1. create a new bin with a mux and filesink, link it to the queue
>  2. remove the blocking probe on the queue srcpad; data starts to flow
> into the mux and filesink
>  3. meanwhile (I think) the addition of the mux and the filesink have
> triggered allocation strategy negotiations in the pipeline
>  4. the mux and filesink drain some of the data from queue, then block the
> queue's srcpad again
>  5. the mux and the filesink get eos, and are dropped
>  6. the negotiation started in step (3) above hangs; both upstream threads
> make similar last utterances here:
>
> 0:00:23.211310000 31628 0x7f9ac090d400 LOG                    queue
> gstqueue.c:860:gboolean gst_queue_handle_sink_query(GstPad *, GstObject *,
> GstQuery *):<ringbuffer-queue> queuing query 0x7f9ac28000f0 (allocation)
> 0:00:23.223470000 31628 0x7f9ac08f2140 LOG                    queue
> gstqueue.c:860:gboolean gst_queue_handle_sink_query(GstPad *, GstObject *,
> GstQuery *):<upstream-queue> queuing query 0x7f9ac10018a0 (allocation)
>
> Is this happening because the mux and filesink aren't around long enough
> for the allocation strategy negotiation to complete?
>
> As always, I'd really appreciate any guidance you can offer.
>
> -Todd
>
>
>
>
> On Mon, Jan 6, 2014 at 8:40 PM, Todd Agulnick <todd at agulnick.com> wrote:
>
>> Well, it looks like I was wrong again -- it's becoming a bad habit.
>>
>> My new theory is that there is a race condition that is exposed by the
>> rapid unblocking and re-blocking of the source pad on my queue. There are
>> three threads in my pipeline: A manages the video source, B manages the
>> x264enc, and C manages the mux and sink. The probes are inserted at the
>> boundary between B and C but run entirely within thread C.
>>
>> There are two scenarios. In the scenario that works, the blocking probe
>> is removed, the queue is drained, but more data is needed than is available
>> in the queue, so the whole pipeline keeps chugging along until all the data
>> required has passed through the mux. When all the data required passes
>> through the mux, the blocking probe is reinstated, and threads A & B
>> continue dumping data into the queue whose sink runs in thread B.
>>
>> In the other scenario, the blocking probe is removed from the queue, and
>> the entirety of the data request is serviced by the data available in the
>> queue. In this case, the blocking probe is re-instated essentially
>> instantaneously. And when this happens, threads A & B seem to stall. They
>> can be made to start up again by removing the blocking probe again.
>>
>> The smoking gun here is when I run with GST_DEBUG=GST_SCHEDULING, these
>> are the last log items I see for elements in threads A & B:
>>
>> 0:00:21.803462000 16032 0x7ffdfb0f2140 LOG           GST_SCHEDULING
>> gstpad.c:3762:GstFlowReturn gst_pad_chain_data_unchecked(GstPad *,
>> GstPadProbeType, void *):<upstream-queue:sink> called chainfunction
>> &gst_queue_chain with buffer 0x7ffdff042ea0, returned ok
>> 0:00:21.803472000 16032 0x7ffdfb0f2140 LOG           GST_SCHEDULING
>> gstpad.c:3762:GstFlowReturn gst_pad_chain_data_unchecked(GstPad *,
>> GstPadProbeType, void *):<video-convert:sink> called chainfunction
>> &gst_base_transform_chain with buffer 0x7ffdff042ea0, returned ok
>> 0:00:21.803480000 16032 0x7ffdfb0f2140 LOG           GST_SCHEDULING
>> gstpad.c:3762:GstFlowReturn gst_pad_chain_data_unchecked(GstPad *,
>> GstPadProbeType, void *):<caps-filter:sink> called chainfunction
>> &gst_base_transform_chain with buffer 0x7ffdff042ea0, returned ok
>> 0:00:21.803502000 16032 0x7ffdfb10d400 LOG           GST_SCHEDULING
>> gstpad.c:3756:GstFlowReturn gst_pad_chain_data_unchecked(GstPad *,
>> GstPadProbeType, void *):<video-encoder:sink> calling chainfunction
>> &gst_video_encoder_chain with buffer buffer: 0x7ffdff042ea0, pts
>> 0:00:21.745581333, dts 0:00:21.745581333, dur 0:00:00.033333333, size
>> 3110400, offset 652, offset_end 653, flags 0x0
>>
>>
>> After that, it's complete silence from these elements until I unblock my
>> queue in thread C. The last line above, in particular, is worrisome, as it
>> looks like we call the video encoder but it never returns!
>>
>> I'd really appreciate any advice about how to go about tracking this
>> down. I don't really understand how the activity in thread C could block
>> threads A & B, which is what seems to be happening.
>>
>> -Todd
>>
>>
>>
>>
>> On Mon, Jan 6, 2014 at 3:25 PM, Todd Agulnick <todd at agulnick.com> wrote:
>>
>>> I have this nearly working now (see below for the remaining puzzle) and
>>> wanted to send my thanks to Sebastian, Tim, and Pedro for helping to steer
>>> me in the right direction. I really appreciate your guidance.
>>>
>>> Here's my setup. It's similar to Pedro's.
>>>
>>> videotestsrc is-live=true ! capsfilter ! videorate ! videoconvert !
>>> x264enc ! queue leaky=downstream max-size-bytes=500MB max-size-buffers=0
>>> max-size-time=0 ! bin, where bin is disposed and recreated for each request
>>> and contains mp4mux ! filesink.
>>>
>>> I've got this rigged up inside a server that listens on a socket for
>>> incoming requests which identify a desired time-based segment of the video
>>> stream. In the quiescent state, there's a blocking probe on the leaky
>>> queue's source pad, so data flows all the way into that queue and then,
>>> once the queue is full, old data is dropped.
>>>
>>> When a request comes in, I install a new (non-blocking) probe and remove
>>> the existing blocking probe. Data starts to flow through the leaky queue,
>>> and the new probe's callback inspects the PTS on each video frame, waiting
>>> first for a keyframe that's within the requested window (at which point it
>>> stops dropping frames and instead starts passing them to the mux and sink),
>>> and then for a video frame that is beyond the requested window, at which
>>> point it sends an EOS through the bin to finalize the file; when the EOS
>>> appears on the application bus, the app removes the non-blocking probe,
>>> re-instates the blocking probe, NULLs the bin, removes the bin, and then
>>> sends the result back to the client through the socket and awaits the next
>>> request.
>>>
>>> All of this works like a charm, EXCEPT for the following observed
>>> behavior: when I reinstate the blocking probe on the queue's source pad, if
>>> there is any data in the queue, data stops flowing into the queue. Indeed,
>>> the whole pipeline goes eerily quiet. If, however, the request ends up
>>> draining the queue completely, when I reinstate the blocking probe data
>>> continues to flow into and build up inside the queue.
>>>
>>> I'm about to dig into the queue code to see if I can understand why that
>>> might be happening, but I thought I would ping the experts first to see if
>>> this rings a bell.
>>>
>>> -Todd
>>>
>>>
>>>
>>> On Wed, Jan 1, 2014 at 1:39 PM, Todd Agulnick <todd at agulnick.com> wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jan 1, 2014 at 4:23 AM, Sebastian Dröge <
>>>> sebastian at centricular.com> wrote:
>>>>>
>>>>>
>>>>> You should be able to drop the message from the sync bus handler of the
>>>>> bin too, to prevent it going up in the pipeline hierarchy.
>>>>>
>>>>
>>>> Just to follow up with a conversation that took place on IRC just now:
>>>>
>>>> You can't do this because the GstBin already has a sync bus handler,
>>>> and there can be only one. We talked about possible modifications to GstBin
>>>> to support the desired behavior (bug filed here:
>>>> https://bugzilla.gnome.org/show_bug.cgi?id=721310), but for now as a
>>>> work-around we're going to catch the EOS just upstream of the filesink to
>>>> see if that works.
>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/gstreamer-devel/attachments/20140107/1bb2d3f8/attachment-0001.html>