[gst-devel] proposal: support for row-stride in gstreamer

Fri Jul 3 01:43:17 CEST 2009

Hi gstreamer folks,

The following is a proposal for how to add row-stride (and possibly  
some related changes) to gstreamer.  I gave a couple of possible  
examples of where this would be useful, but it is probably not  
exhaustive.  Please let me know if you see any cases that I missed, or  
details that I overlooked, etc.

Use-cases:
----------
  + display hardware with special constraints on image dimensions, for  
example
    if the output buffer must have dimensions that are a power of two
  + zero-copy cropping of image / videoframe (at least for interleaved  
color
    formats.. more on this later)

One example to think about is rendering onto a 3d surface.  In some  
cases, graphics hardware could require that the surface dimensions are  
a power of 2.  In this case, you would want the vsink to allocate a  
buffer with a rowstride that is the next larger power of 2 from the  
image width.

Another example to think about is video stabilization.  In this use  
case, you would ask the camera to capture an oversized frame.  Your  
vstab algorithm would calculate an x,y offset of the stabilized  
image.  But if the decoder understands rowstride, you do not need to  
actually copy the image buffer.  Say, just to pick some numbers, you  
want your final output to be 640x480, and you want your oversized  
frame to be +20% in each dimension (768x576):

    +--------+           +-------+           +------+
    | camera |---------->| vstab |---------->| venc |
    +--------+ width=768 +-------+ width=640 +------+
              height=576          height=480
           rowstride=768       rowstride=768

In the case of an interleaved color format (RGB, UYVY, etc), you could  
simply increment the 'data' pointer in the buffer by (y*rowstride)+x.   
No memcpy() required.  As long as the video encoder respects the  
rowstride, it will see the stabilized frame correctly.

Proposal:
---------

In all cases that I can think of, the row-stride will not be changing  
dynamically.  So this parameter can be negotiated thru caps  
negotiation in the same way as image width/height, colorformat, etc.   
However, we need to know conclusively that there is no element in the  
pipeline that cares about the image format, but does not understand  
"rowstride", so we cannot use existing type strings (ex. "video/x-raw- 
yuv").  And, at least in the cases that I can think of, the video sink  
will dictate the row-stride.  So upstream caps-renegotiation will be  
used to arrive at the final "rowstride" value.

For media types, I propose to continue using existing strings for non- 
stride-aware element caps, ex. "video/x-raw-yuv".  For stride-aware  
elements, they can support a second format, ex. "video/x-raw-yuv- 
strided", "image/x-raw-rgb-strided", etc (ie. append "-strided" to  
whatever the existing string is).  In the case that a strided format  
is negotiated, it is required for there to also be a "rowstride" entry  
in the final negotiated caps.

question: in general, most elements supporting rowstride will have no  
constraint on what particular rowstride values are supported.  Do they  
just list "rowstride=[0-4294967295]" in their caps template?  The  
video sink allocating the buffer will likely have some constraints on  
rowstride, although this will be a function of the width (for example,  
round the width up to next power of two).

We will implement some sort of GstRowStrideTransform element to  
interface between stride-aware and non-stride-aware elements.

Non-Interleaved Color Formats:
------------------------------

So, everything I've said so far about zero-copy cropping works until  
you start considering planar/semi-planar color formats which do not  
have equal size planes.  For example, consider NV12:  the offset to  
add to the Y plane is (y*rowstride)+x, but the offset to add to UV  
plane is ((y/2)*rowstride)+x.  There are only three ways that I can  
think of to deal with this (listed in order of my preference):

  1) add fields to GstBuffer to pass additional pointers to the other  
color
     planes within the same GstBuffer
  2) add a field(s) to GstBuffer to pass an offset..  either an x,y  
offset, or a
     single value that is (y*rowstride)+x.  Either way, the various  
elements in
     the pipeline can use this to calculate the start of the  
individual planes
     of data.
  3) pass individual planes of a single image as separate  
GstBuffer's.. but I'm
     not a huge fan of this because now every element needs to have  
some sort
     of "I've got Y, but I'm waiting for UV" state.

I'm not sure if anyone has any thoughts about which of these three  
approaches is preferred.  Or any alternative ideas?

BR,
-Rob