[Bug 611157] video: API to signal stereoscopic and multiview video

Wed Nov 20 09:49:42 PST 2013

https://bugzilla.gnome.org/show_bug.cgi?id=611157
  GStreamer | gst-plugins-base | unspecified

bparker <gst> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gst at fiveforty.net

--- Comment #75 from bparker <gst at fiveforty.net> 2013-11-20 17:49:25 UTC ---
Being in the 3D video field (stereoscopic and autostereoscopic) for some time I
would like to add my input on what I work with on a daily (commercial) basis.

Typical 3D video sources I work with are:

  - One file with one video stream, having stereo 3D as either half or full
resolution per eye.
  - One file with multiple video streams (anywhere from 2-9 in my case), either
for stereo or autostereo playback. Last I checked the qtdemux element was hard
coded to 8 max video streams so we've had to use other containers in some
cases.
  - Two files with one video stream each, e.g. Video_LeftEye.mp4 and
Video_RightEye.mp4, where playback of the two files must be synchronized as if
it were one video file. This is common with dual-camera (genlocked of course)
video setups.
  - Image sequences (packed and separate files per eye) from a very large/fast
SAN (png, tif, dpx etc.) in the case of in-production video (films/TV etc).
  - Live video feed from a camera(s), typically using Blackmagic DeckLink
capture cards (could use decklinksrc element if it had 3D support).
  - In-memory sources (video editor, IP stream) that must be played out to a 3D
display (could use decklinksink element if it had 3D support).

  Potential input formats of video:

  Stereo:
  - Over/under
  - Left/right
  - Frame sequential
  - Horizontal/vertical interlaced (not very important on input side)
  - Checkerboard (not very important on input side)
  - separate video streams per eye

  Autostereo (multiview):
  - 3x3 matrix packed in one frame. Typical 8 or 9-view displays only have
~1/3rd the resolution of the panel's native format, so there isn't really any
quality loss in this video format
  - 2D+depth where the extra views are generated (interpolated) internally,
usually with OpenCV or proprietary algorithms
  - separate video streams per eye

---

  Potential output formats (support varies wildly by display):

  Stereo (half-res per eye):
  - Over/under (most used on active/passive 3DTVs and single/dual projector
systems, recommended for passive since no scaling is necessary in most cases)
  - Left/right (most used on active/passive 3DTVs and single/dual projector
systems, recommended for active)
  - Horizontal/Row interlaced (native format of passive 3DTVs and some
single-view autostereo displays, requires no manual 3D setup on the display)
  - Vertical/Column interlaced (native format of some single-view autostereo
displays)
  - Checkerboard (alternating half-res left/right eye pixels, mostly used on
DLP active 3DTVs)
  - Anaglyph (mainly for viewing on 2D monitors, many different possible color
mixtures: red/blue, red/cyan, red/green, green/magenta, yellow/blue (and then
options for 100% color, 50%, other algorithms etc.)

  Stereo (full-res per eye):
  - Frame sequential (alternating left/right eye images, usually at 120hz as
it's the native format of NVIDIA 3D Vision monitors, also supported by some
3DTVs and projectors)
  - HDMI 1.4a frame packing, requires physical hardware support in the video
sink device/GPU (requires no manual 3D setup on the display, it is detected
automatically), supported by Blackmagic DeckLink cards (but not in
decklinksrc/sink elements)
  - "Dual-stream" output (usually HD-SDI only, separate physical cables for
left/right eye), used by professional/medical displays and supported by
Blackmagic DeckLink cards (but not in decklinksrc/sink elements)

  Autostereo (multiview):
  - Completely proprietary and display-dependent. Most use a lenticular lens or
parallax barrier with 5, 8 or 9 views and require a GPU shader to interleave
all required views into one packed frame using special repeating RGB/BGR
patterns. The patterns themselves are also sometimes altered to adjust for
optimal viewing distance, diminish cross-talk etc. but all non-optical
adjustments affect image quality. It would be sufficient to have a GL video
sink element with a custom fragment shader option to support most of these
displays, but without a secondary texture for the repeating pattern, GPU usage
will skyrocket with all the branching/modulo operations needed. Currently most
display vendors I've seen are very reluctant to hand over their pixel pattern
or optical parameters to allow 3rd party video player development.

---

Most stereo 3D displays cannot auto-detect 3D video and switch modes
accordingly (except if HDMI 1.4a is used, or in some cases the display may
actually try to use image analysis (some Panasonic models do this)), so the
display is usually forced into a certain 3D mode by the user.

Sometimes it's required to throw away one of the eyes and only display one of
them, in the case of a 2D display or a user who is uncomfortable watching 3D.
There should be an option to choose which view (left or right) is used for the
2D image.
Also, if the display is in a 3D-only mode, it is not sufficient to simply show
one eye in fullscreen 2D, but two copies of the same eye as if you were
displaying actual 3D. Example: 3DTV manually forced to left/right mode: 2D
images must be displayed as a side-by-side left/left or right/right image.
There is usually no way to query the current mode from a 3D display except for
some that use a DB9 serial control port with proprietary protocol.

Some displays render 3D video in reverse (left eye sees the image meant for
right eye etc.) so we must also have an option to "swap eyes". Some displays
also have a swap option.

Mirror-based dual-camera rigs can require horizontal/vertical flipping of one
eye to get a correct image.

Some dual-camera systems have problems with correct horizontal/vertical
convergence, and existing stereo 3D software and 3DTV's have a convergence
adjustment option to help with this if it cannot be corrected optically. This
would be a nice option to have but not totally necessary.

-- 
Configure bugmail: https://bugzilla.gnome.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.