[Mesa-dev] reworking pipe_video_decoder / pipe_video_buffer

Wed Nov 16 05:47:28 PST 2011

On 15.11.2011 17:52, Maarten Lankhorst wrote:
> Deleted:
> - begin_frame/end_frame: Was only useful for XvMC, should be folded into flush..

I'm not completely happy with the current interface also, but if you 
remove the state tracker ability to control how many buffers are used, 
which in turn lets the VDPAU and XvMC state tracker use the same 
buffering algorithm. This is quite bad, because for XvMC we are forced 
to use a buffer for each destination surface, while in the VDPAU/VAAPI 
case a much simpler method should be used.

> - set_quant_matrix/set_reference_frames:
>      they should become part of picture_desc,
>      not all codecs deal with it in the same way,
>      and some may not have all of the calls.
That is true, but also intended. The idea behind it is that it is not 
necessary for all codecs to set all buffer types, but instead inform the 
driver that a specific buffer type has changed. This gives the driver 
the ability to know which parameters has changed between calls to 
decode, and so only perform the (probably expensive) update of those 
parameters on the hardware side.

> - set_picture_parameters: Can be passed to decode_bitstream/macroblock
Same as above, take a look at the enum VABufferType  in the VAAPI 
specification (http://cgit.freedesktop.org/libva/tree/va/va.h).
> - set_decode_target: Idem
> - create_decode_buffer/set_decode_buffer/destroy_decode_buffer:
>      Even if a decoder wants it, the state tracker has no business knowing
>      about it.
Unfortunately that isn't true, only VDPAU hasn't a concept of giving the 
application a certain control over the buffers. XvMC has the implicit 
association of internal decode buffers with destination surfaces, and 
VAAPI gives the application both an one shot copy and a map/unmap 
interface to provide buffer content.

When we want to avoid unnecessary coping around of buffer content then 
this interface has to become even more complicated, like new buffer 
types, using create user buffer, mapping/unmapping of to be decoded data 
buffers etc.

> flush is changed to only flush a single pipe_video_buffer,
> this should reduce the state that needs to be maintained for XvMC otherwise.
I can't really see how is that improving the situation, all it does is 
reducing the interface to what is needed for VDPAU, but totally ignoring 
requirements for VAAPI for example.
> Note: internally you can still use those calls, as long as the *public* api
> would be reduced to this, pipe_video_buffer is specific to each driver,
> so you can put in a lot of state there if you choose to do so. For example,
> quant matrix is not used in vc-1, and h264 will need a different
> way for set_reference_frames, see struct VdpReferenceFrameH264. This is why
> I want to remove those calls, and put them into picture_desc..
Correct, those calls need to be extended to support more codecs, but 
just stuffing everything into a single structure also doesn't seems to 
be a solution that will work for a wider view of different state 
trackers or hardware.
> struct pipe_video_buffer I'm less sure about, especially wrt interlacing
> and alignment it needs more thought. height and width are aligned to 64
> on nvidia, and requires a special tiling flag. The compositor will need
> to become aware of interlacing, to be able to play back interlaced videos.
Alignment isn't so much of an issue. Take a look at 
vl_video_buffer_create, it is already aligning the width/height to a 
power of two if the hardware needs that. So it is perfectly legal for a 
video buffer to be larger than the requested size, this is already 
tested with the current color conversion code and should work fine with 
both XvMC and VDPAU.

> I honestly haven't read up on interlacing yet, and haven't found a video
> to test it with, so that part is just speculative crap, and might change
> when I find out more about it.
Regarding deinterlacing I'm also not really sure what we should do about 
it, but all interfaces (XvMC/VDPAU/VAAPI/DXVA) seems to treat surfaces 
as progressive. So I think the output of a decode call should already be 
in a progressive format (or can be at least handled as one for sampling).

> Testing interlaced  videos that decode correctly with nvidia vdpau would help
> a lot to figure out what the proper way to handle interlacing would be, so if
> someone has a bunch that play with nvidia accelerated vdpau&  mplayer correctly,
> could you please link them? ;)
I will try to look around some more, but at least for MPEG2 I haven't 
found any interlaced videos for testing also. I will leave you a note if 
I find something.
> /**
>   * output for decoding / input for displaying
>   */
> struct pipe_video_buffer
> {
>     struct pipe_context *context;
>
>     enum pipe_format buffer_format;
>     // Note: buffer_format may change as a result of put_bits, or call to decode_bitstream
>     // afaict there is no guarantee a buffer filled with put_bits can be used as reference
>     // frame to decode_bitstream
Altering the video buffer format as a result to an application putting 
video content manually into it is something very VDPAU specific. So I 
think it would be better to let the state tracker check if the new 
content matches what is in the buffer currently and if the need arise 
recreate the buffer with a new format.

>     enum pipe_video_chroma_format chroma_format;
>     unsigned width;
>     unsigned height;
>
>     enum pipe_video_interlace_type layout;
>     // progressive
>     // even and odd lines are split
>     // interlaced, top field valid only (half height)
>     // interlaced, bottom field valid only
>     // I'm really drawing a blank what would be sane here, since interlacing has a ton of
>     // codec specific information, and current design doesn't handle it at all..
The concrete memory layout of the buffer is implementation specific, so 
I think things like interlacing and/or special tilling modes are 
something that the driver needs to handle and should not be exposed to 
the state tracker.
>     /**
>      * destroy this video buffer
>      */
>     void (*destroy)(struct pipe_video_buffer *buffer);
>
>     /**
>      * get a individual sampler view for each component
>      */
>     struct pipe_sampler_view **(*get_sampler_view_components)(struct pipe_video_buffer *buffer);
>     // Note: for progressive split in 2 halfs, would probably need up 6...
>     // top Y, bottom Y, top CbCr, bottom CbCr
>
>     /**
>      * get a individual surfaces for each plane
>      */
>     struct pipe_surface **(*get_surfaces)(struct pipe_video_buffer *buffer);
>
>     /**
>      * write bits to a video buffer, possibly altering the format of this video buffer
>      */
>     void (*put_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
>                      void const *const *source_data, uint32_t const *source_pitches);
>
>     /**
>      * read bits from a video buffer
>      */
>     void (*get_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
>                      void *const*source_data, uint32_t const *source_pitches);
> };

Having put_bits and get_bits implementations are actually a bad idea, 
because they assume that the state tracker has a pointer where data is 
stored. But some state trackers (especially VAAPI) needs an map/unmap 
like interface, where the driver is returning a pointer and pitch to 
some linear aligned memory. Like with tilling that sometimes requires 
the buffer to be copied into a linear memory layout, that was at least 
the way we handled it inside gallium so far.

Over all it is good to know that somebody is still looking into it.

Regards,
Christian.