[Mesa-dev] reworking pipe_video_decoder / pipe_video_buffer

Wed Nov 16 06:38:15 PST 2011

Hey,

On 11/16/2011 02:47 PM, Christian König wrote:
> On 15.11.2011 17:52, Maarten Lankhorst wrote:
>> Deleted:
>> - begin_frame/end_frame: Was only useful for XvMC, should be folded into flush..
>
> I'm not completely happy with the current interface also, but if you remove the state tracker ability to control how many buffers are used, which in turn lets the VDPAU and XvMC state tracker use the same buffering algorithm. This is quite bad, because for XvMC we are forced to use a buffer for each destination surface, while in the VDPAU/VAAPI case a much simpler method should be used.
I wasn't aware of that, the patch could easily be changed for entrypoint <= bitstream.
Only create a single decoder buffer, otherwise attach it to the surface.
I still think it should be removed from the state tracker though, and it would
still be possible..

>> - set_quant_matrix/set_reference_frames:
>>      they should become part of picture_desc,
>>      not all codecs deal with it in the same way,
>>      and some may not have all of the calls.
> That is true, but also intended. The idea behind it is that it is not necessary for all codecs to set all buffer types, but instead inform the driver that a specific buffer type has changed. This gives the driver the ability to know which parameters has changed between calls to decode, and so only perform the (probably expensive) update of those parameters on the hardware side.
This is why I used the flush call for XvMC as a way to signal parameter changes. If you looked at the
decode_macroblock for g3dvl, it checks if (dec->current_buffer == target->decode_buffer) in
begin_frame, so it doesn't flush state.

For VA-API, I haven't had the .. pleasure(?) of playing around with it, and it seems
to be only stubs.

If the decode_bitstream interface is changed to get all bitstream buffers at the same time,
there wouldn't be overhead to doing it like this. For a single picture it's supposed to stay constant,
so for vdpau the sane way would be: set picture parameters for hardware (includes EVERYTHING),
write all bitstream buffers to a hardware bo, wait until magic is done. Afaict, there isn't even a sane
way to only submit partial buffers, so it's just a bunch of overhead for me.

nvidia doesn't support va-api, it handles the entire process from picture parameters
to a decoded buffer internally so it always convert the picture parameters into
something the hardware can understand, every frame.

>> - set_picture_parameters: Can be passed to decode_bitstream/macroblock
> Same as above, take a look at the enum VABufferType  in the VAAPI specification (http://cgit.freedesktop.org/libva/tree/va/va.h).
>> - set_decode_target: Idem
>> - create_decode_buffer/set_decode_buffer/destroy_decode_buffer:
>>      Even if a decoder wants it, the state tracker has no business knowing
>>      about it.
> Unfortunately that isn't true, only VDPAU hasn't a concept of giving the application a certain control over the buffers. XvMC has the implicit association of internal decode buffers with destination surfaces, and VAAPI gives the application both an one shot copy and a map/unmap interface to provide buffer content.
>
> When we want to avoid unnecessary coping around of buffer content then this interface has to become even more complicated, like new buffer types, using create user buffer, mapping/unmapping of to be decoded data buffers etc.
There is no guarantee a user created video buffer can be used as reference frame, I honestly don't even know if it's possible.

Furthermore I don't see any code in state_trackers/va except some stubs, so I cant discuss an interface that's not even there to begin with.

>> flush is changed to only flush a single pipe_video_buffer,
>> this should reduce the state that needs to be maintained for XvMC otherwise.
> I can't really see how is that improving the situation, all it does is reducing the interface to what is needed for VDPAU, but totally ignoring requirements for VAAPI for example.
See above.

>> Note: internally you can still use those calls, as long as the *public* api
>> would be reduced to this, pipe_video_buffer is specific to each driver,
>> so you can put in a lot of state there if you choose to do so. For example,
>> quant matrix is not used in vc-1, and h264 will need a different
>> way for set_reference_frames, see struct VdpReferenceFrameH264. This is why
>> I want to remove those calls, and put them into picture_desc..
> Correct, those calls need to be extended to support more codecs, but just stuffing everything into a single structure also doesn't seems to be a solution that will work for a wider view of different state trackers or hardware.
All hardware has its own unique approach with their own api, so it would make sense if there is a separate call for decode_va too,
with maybe some bits that tell what changed, so it only has to upload those specific parts..

>> struct pipe_video_buffer I'm less sure about, especially wrt interlacing
>> and alignment it needs more thought. height and width are aligned to 64
>> on nvidia, and requires a special tiling flag. The compositor will need
>> to become aware of interlacing, to be able to play back interlaced videos.
> Alignment isn't so much of an issue. Take a look at vl_video_buffer_create, it is already aligning the width/height to a power of two if the hardware needs that. So it is perfectly legal for a video buffer to be larger than the requested size, this is already tested with the current color conversion code and should work fine with both XvMC and VDPAU.
>
>> I honestly haven't read up on interlacing yet, and haven't found a video
>> to test it with, so that part is just speculative crap, and might change
>> when I find out more about it.
> Regarding deinterlacing I'm also not really sure what we should do about it, but all interfaces (XvMC/VDPAU/VAAPI/DXVA) seems to treat surfaces as progressive. So I think the output of a decode call should already be in a progressive format (or can be at least handled as one for sampling).
No, they present the surfaces as progressive, but that doesn't necessarily mean they are,
nothing would prevent it from just seeing GetBitsYCbCr as a special (slow) case.

Considering how different the approach for vdpau and va-api is,
wouldn't it just make sense to handle va-api separately then?

>> Testing interlaced  videos that decode correctly with nvidia vdpau would help
>> a lot to figure out what the proper way to handle interlacing would be, so if
>> someone has a bunch that play with nvidia accelerated vdpau&  mplayer correctly,
>> could you please link them? ;)
> I will try to look around some more, but at least for MPEG2 I haven't found any interlaced videos for testing also. I will leave you a note if I find something.
>> /**
>>   * output for decoding / input for displaying
>>   */
>> struct pipe_video_buffer
>> {
>>     struct pipe_context *context;
>>
>>     enum pipe_format buffer_format;
>>     // Note: buffer_format may change as a result of put_bits, or call to decode_bitstream
>>     // afaict there is no guarantee a buffer filled with put_bits can be used as reference
>>     // frame to decode_bitstream
> Altering the video buffer format as a result to an application putting video content manually into it is something very VDPAU specific. So I think it would be better to let the state tracker check if the new content matches what is in the buffer currently and if the need arise recreate the buffer with a new format.
This is only an optimization, I did it for g3dvl because it's faster in the case where PutBitsYCbCr is used when
mplayer is not using vdpau. Eventually all buffers will have the correct layout, and it ends up being a few
memcpy's. For nvidia the format will have to stay NV12, so I can't change format in that case..

>>     enum pipe_video_chroma_format chroma_format;
>>     unsigned width;
>>     unsigned height;
>>
>>     enum pipe_video_interlace_type layout;
>>     // progressive
>>     // even and odd lines are split
>>     // interlaced, top field valid only (half height)
>>     // interlaced, bottom field valid only
>>     // I'm really drawing a blank what would be sane here, since interlacing has a ton of
>>     // codec specific information, and current design doesn't handle it at all..
> The concrete memory layout of the buffer is implementation specific, so I think things like interlacing and/or special tilling modes are something that the driver needs to handle and should not be exposed to the state tracker.
I don't think so..

 * \defgroup VdpVideoMixer VdpVideoMixer; Video Post-processing \
 *           and Compositing object
 *
 * VdpVideoMixer can perform some subset of the following
 * post-processing steps on video:
 * - De-interlacing
 *   - Various types, with or without inverse telecine
 * - Noise-reduction
 * - Sharpness adjustment
 * - Color space conversion to RGB
 * - Chroma format upscaling to 4:4:4

Regardless of being va-api/vdpau/xvmc, those steps need to be performed, for all hardware.
Deinterlacing shouldn't be done in hardware specific bits, and this step could also easily
be changed to include frames that are split up too..

>>     /**
>>      * destroy this video buffer
>>      */
>>     void (*destroy)(struct pipe_video_buffer *buffer);
>>
>>     /**
>>      * get a individual sampler view for each component
>>      */
>>     struct pipe_sampler_view **(*get_sampler_view_components)(struct pipe_video_buffer *buffer);
>>     // Note: for progressive split in 2 halfs, would probably need up 6...
>>     // top Y, bottom Y, top CbCr, bottom CbCr
>>
>>     /**
>>      * get a individual surfaces for each plane
>>      */
>>     struct pipe_surface **(*get_surfaces)(struct pipe_video_buffer *buffer);
>>
>>     /**
>>      * write bits to a video buffer, possibly altering the format of this video buffer
>>      */
>>     void (*put_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
>>                      void const *const *source_data, uint32_t const *source_pitches);
>>
>>     /**
>>      * read bits from a video buffer
>>      */
>>     void (*get_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
>>                      void *const*source_data, uint32_t const *source_pitches);
>> };
>
> Having put_bits and get_bits implementations are actually a bad idea, because they assume that the state tracker has a pointer where data is stored. But some state trackers (especially VAAPI) needs an map/unmap like interface, where the driver is returning a pointer and pitch to some linear aligned memory. Like with tilling that sometimes requires the buffer to be copied into a linear memory layout, that was at least the way we handled it inside gallium so far.
>
> Over all it is good to know that somebody is still looking into it.
No, it's a good idea if you have hardware that requires a specific layout in order to use the video buffer as a reference frame...

~Maarten