[Spice-devel] [client 2/3] streaming: Remove the video decoder's dependency on SpiceMsgIn messages

Thu Apr 6 11:15:14 UTC 2017

On Wed, 5 Apr 2017, Christophe Fergeau wrote:
[...]
> > +typedef struct SpiceMetaFrame {
> > +    SpiceFrame *frame;
> > +    GstClockTime pts;
> > +    void *sample;
> 
> Why not GstSample *sample?

Eh. This comes from a codebase that has to support... GStreamer 0.10. 
But sure this should just be a GstSample in this incarnation.

> > +} SpiceMetaFrame;
> 
> SpiceGstFrame rather than SpiceMetaFrame?

The Meta came from my attempts to use GstMeta which is unfortunately 
unusable. Then I kept is since it's more meta information about the 
SpiceFrame. But I'm fine with SpiceGstFrame and gstframe.

> > @@ -455,22 +445,22 @@ static gboolean spice_gst_decoder_queue_frame(VideoDecoder *video_decoder,
> >          return FALSE;
> >      }
> >  
> > -    /* ref() the frame_msg for the buffer */
> > -    spice_msg_in_ref(frame_msg);
> > +    /* ref() the frame data for the buffer */
> > +    frame->ref_data(frame->data_opaque);
> >      GstBuffer *buffer = gst_buffer_new_wrapped_full(GST_MEMORY_FLAG_PHYSICALLY_CONTIGUOUS,
> > -                                                    data, size, 0, size,
> > -                                                    frame_msg, &release_buffer_data);
> > +                                                    frame->data, frame->size, 0, frame->size,
> > +                                                    frame->data_opaque, frame->unref_data);
> >  
> >      GST_BUFFER_DURATION(buffer) = GST_CLOCK_TIME_NONE;
> >      GST_BUFFER_DTS(buffer) = GST_CLOCK_TIME_NONE;
> >      GST_BUFFER_PTS(buffer) = gst_clock_get_time(decoder->clock) - gst_element_get_base_time(decoder->pipeline) + ((uint64_t)MAX(0, latency)) * 1000 * 1000;
> >  
> >      g_mutex_lock(&decoder->queues_mutex);
> > -    g_queue_push_tail(decoder->decoding_queue, create_frame(buffer, frame_msg));
> > +    g_queue_push_tail(decoder->decoding_queue, create_meta_frame(frame, buffer));
> 
> I believe this is my main grip with this patch, the somehow disjoint
> SpiceFrame::free and SpiceFrame::unref_data which needs to get called at
> different times, sounds like something we'll get wrong in the future.
> I'm wondering if passing a SpiceGstFrame (aka SpiceMetaFrame) as the
> last arg to gst_buffer_new_wrapped_full() would help here by grouping
> the time we unref the message, and the time we free the SpiceFrame.
> 
> Or does the SpiceGstFrame need to outlive this buffer?

Yes. The GstBuffer is no longer needed once it has been decoded by the 
video codec. The GStreamer video codec is the one that frees frame->data 
through frame->unref_data(), even before we get the decompressed frame.

On the other side of the decoding pipeline we get a GstSample which 
contains our decoded frame and which we match with the corresponding 
SpiceGstFrame structure so we know when and where to display it. So the 
SpiceGstFrame can outlive frame->data by tens of milliseconds.

So we are forced to implement a ref()/unref() API for GStreamer 
compatibility. Fortunately this is more general than a simple free() so 
it should not impose undue restrictions down the line.

If you prefer I could implement a ref()/unref() API on SpiceFrame too to 
be more consistent but it's more complex than a simple free.

> (lifetime of
> everything involved is not clear, the SpiceGstFrame moves from
> decoding_queue to display_queue, and is mostly needed for display_queue
> consumers, I did not check how long the GstBuffer created here stays
> alive compared to a buffer sitting in these queues).

I'll add a comment detailing the process to 
spice_gst_decoder_queue_frame() (since it's responsible for getting the 
ball rolling. But it could also go in the file header).

[...]
> > @@ -166,12 +164,13 @@ static gboolean mjpeg_decoder_decode_frame(gpointer video_decoder)
> >          dest = &(decoder->out_frame[decoder->mjpeg_cinfo.output_scanline * width * 4]);
> >      }
> >      jpeg_finish_decompress(&decoder->mjpeg_cinfo);
> > +    decoder->cur_frame->unref_data(decoder->cur_frame->data_opaque);
> >  
> >      /* Display the frame and dispose of it */
> > -    stream_display_frame(decoder->base.stream, decoder->cur_frame_msg,
> > +    stream_display_frame(decoder->base.stream, decoder->cur_frame,
> >                           width, height, decoder->out_frame);
> > -    spice_msg_in_unref(decoder->cur_frame_msg);
> > -    decoder->cur_frame_msg = NULL;
> > +    decoder->cur_frame->free(decoder->cur_frame);
> > +    decoder->cur_frame = NULL;
> >      decoder->timer_id = 0;
> 
> ->unref_data + ->free can probably be done in a helper for the mjpeg
> decoder?

Yes. It does mean freeing the network message a bit later than strictly 
necessary, that is after the frame has been displayed rather than 
before, but it probably does not matter.

We're already keeping it around for much longer since we only decompress 
the frame at the last millisecond anyway. That is, the timeline is this:

 1) Receive a SpiceMsgIn containing a compressed MJPEG frame 40 ms 
    before it is due to be displayed.
 2) Wrap it in a SpiceFrame for the MJPEG decoder.
 3) mjpeg_decoder_queue_frame() puts the SpiceFrame in decoder->msgq and
    calls mjpeg_decoder_schedule().
 4) mjpeg_decoder_schedule() arranges for mjpeg_decoder_decode_frame() 
    to be called in the main thread 40 ms from now.
 5) 40 ms later it's time to display the frame. Oups it has not be 
    decompressed yet. So do on the double now, blocking the main thread 
    while we're at it.
 6) 4 ms or so later the frame has been decompressed. We no longer need 
    the SpiceMsgIn message any more but still keep it.
 7) Call stream_display_frame() to display the frame, 4 ms late!
 8) Microseconds later (presumably), call free_spice_frame() to free 
    both the SpiceMsgIn message and the SpiceFrame structure.

My main beef is with sitting on the compressed message for 40 ms doing 
nothing with it and then decompressing it only when it's essentially too 
late. But then MJPEG is really fast to decompress so it does not really 
matter and I understand why it's done this way (it's simpler, 
minimises memory usage).

So freeing SpiceMsgIn in step 6 or 8 makes no significant difference.

In contrast, with GStreamer, depending on the codec and the GStreamer 
version and bugs, decoding could take a lot more time, possibly into the 
tens of ms. So GStreamer decoder does things differently. It goes like 
this:

 1) Receive a SpiceMsgIn containing a compressed frame 40 ms 
    before it is due to be displayed.
 2) Wrap it in a SpiceFrame for the decoder.
 3) spice_gst_decoder_queue_frame() pushes the compressed frame 
    proper (frame->data) to the GStreamer pipeline for decoding. But it 
    still needs to know the where and when to display the decoded frame 
    later to it create a SpiceMetaFrame and pushes that to the 
    decoding_queue. And it returns control immediately to the caller.
 4) At some unknown point GStreamer no longer needs the compressed frame 
    data and frees it through frame->unref_data().
 5) Some colorspace conversions later GStreamer calls new_sample() with 
    the decompressed frame. We attach it to the corresponding 
    SpiceMetaFrame structure (which we pop) from the decoding_queue.
 6) But it's not time to display it yet so we attach the 
    decompressed frame to the SpiceMetaFrame, push it to the 
    display_queue and call schedule_frame().
 7) schedule_frame() arranges for display_frame() to be called in what 
    may now be only 30 ms away.
 8) display_frame() pops the first SpiceMetaFrame from the display_queue and 
    calls stream_display_frame() right on time.
 9) display_frame() then frees the SpiceMetaFrame and the SpiceFrame and 
    decompressed frame with it.

So as I was saying earlier the compressed frame is freed in step 4 and 
SpiceGstFrame, SpiceFrame and GstSample in step 9, many milliseconds 
later.

The advantages are that the main thread is not blocked while decoding, 
and displaying the frame is not delayed by the longer and less 
predictable decoding times.

And the drawbacks are that the frames are queued in their decompressed 
form rather than the compressed one (increase memory usage), and the 
code is a bit more complex.

-- 
Francois Gouget <fgouget at codeweavers.com>