[Bug 611157] [RFC] more buffer flags and caps fields in gst-video for 3d video

Wed Apr 24 06:19:28 PDT 2013

https://bugzilla.gnome.org/show_bug.cgi?id=611157
  GStreamer | gst-plugins-base | unspecified

--- Comment #63 from Wim Taymans <wim.taymans at gmail.com> 2013-04-24 13:19:25 UTC ---
These patches are too much to handle and reason about in one go, IMO. Let's
step back and identify 2 cases:

 1) compressed frames coming from a demuxer
 2) uncompressed frames coming from a decoder

For 2) we should use GstVideoMeta metadata on buffers and in the caps you have
the number of views available. Only stereo is currently defined and we assume
left is view 0, right is view 1.

what's not possible with 2) currently? 

 - we can't do separate left-right frames arriving in the decoder without
   decoder support. for this the decoder needs to accumulate 2 frames and then
   place them in the outgoing buffer with GstVideoMeta. Do we add this to the
   video decoder base class? The decoder needs to know which frame is left
   and right.
 - We can't do left/right interleaved every pixel or checkerboard or anything
   that is not a rectangular left/right part of the decoded image. For this
   we would need a new pixel format or the decoder needs to transform this
   to something we support.
 - We can't do flipping of planes horizontally or vertically. We could add this
   as flags on the metadata. horizontal flipping could be done with strides.
   This would also need support in video sinks or other elements. Maybe we
   would use separate metadata to define the transform on a video frame?
 - Something else?

For 2) to work we need to pass the right info to the decoder because it is
usually the demuxer that knows the layout etc of the frames. So we need a way
to transport this info, the usual way is to do this with caps

I would like some caps that is a simple string describing the layout, similar
to the colorimetry caps field. The reason is that we don't want to negotiate
N fields. Maybe also similar to how interlaced content works?

I don't like the idea of passing this info with metadata, our parsers don't
deal with metadata well and I have no idea if the metadata would make it to
the decoders. It also sounds too complicated for what it is:

 - in separate frames (flags on buffers define left/right)
 - in one frame (which portion is left/right and where is it and how big is it)
 - mixed (some frames mono, others stereo, flag says what it is)

> @GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_PROGRESSIVE: Frame sequential type.
> @GST_STEREO_VIDEO_FRAME_TYPE_SEQUENTIAL_ROW_INTERLEAVED: Sequential row
>   interleaved.

What are these? how are frames transported to the decoder in these methods?

I don't like how GstStereoVideoScheme creeps into the API. We should define an
API to express 3D video in GStreamer, how to convert to this from any other
scheme should be somewhere else and is not related.

-- 
Configure bugmail: https://bugzilla.gnome.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.