[Mesa-dev] reworking pipe_video_decoder / pipe_video_buffer

Christian König deathsimple at vodafone.de
Mon Nov 21 08:49:00 PST 2011


On 16.11.2011 15:38, Maarten Lankhorst wrote:
> Hey,
>
> On 11/16/2011 02:47 PM, Christian König wrote:
>> On 15.11.2011 17:52, Maarten Lankhorst wrote:
>>> Deleted:
>>> - begin_frame/end_frame: Was only useful for XvMC, should be folded into flush..
>> I'm not completely happy with the current interface also, but if you remove the state tracker ability to control how many buffers are used, which in turn lets the VDPAU and XvMC state tracker use the same buffering algorithm. This is quite bad, because for XvMC we are forced to use a buffer for each destination surface, while in the VDPAU/VAAPI case a much simpler method should be used.
> I wasn't aware of that, the patch could easily be changed for entrypoint<= bitstream.
> Only create a single decoder buffer, otherwise attach it to the surface.
> I still think it should be removed from the state tracker though, and it would
> still be possible..
I would rather pass another parameter to the decoder create function, 
like "scatter_decode" or something like that. Younes originally had a 
parameter just to describe the possible different calling conventions 
for decode, but that covered only slice vs. frame based buffering and I 
didn't understand at what he tried todo with it at that time.

I would also suggest to separate the detection the start of a new frame 
into a function and make this function a public interface in 
vl_mpeg12_decoder_h, so that other drivers can use this one for their 
handling as well.

>>> - set_quant_matrix/set_reference_frames:
>>>       they should become part of picture_desc,
>>>       not all codecs deal with it in the same way,
>>>       and some may not have all of the calls.
>> That is true, but also intended. The idea behind it is that it is not necessary for all codecs to set all buffer types, but instead inform the driver that a specific buffer type has changed. This gives the driver the ability to know which parameters has changed between calls to decode, and so only perform the (probably expensive) update of those parameters on the hardware side.
> This is why I used the flush call for XvMC as a way to signal parameter changes. If you looked at the
> decode_macroblock for g3dvl, it checks if (dec->current_buffer == target->decode_buffer) in
> begin_frame, so it doesn't flush state.
>
> For VA-API, I haven't had the .. pleasure(?) of playing around with it, and it seems
> to be only stubs.
That's correct, the VA-API implementation isn't completed, and I 
wouldn't call it a pleasure to work with it. Intels VA-API specification 
seems to be a bit in a flux, moving structure members around and 
sometimes even changing some calling conventions from one version to 
another.

But they still have one very nice idea to set certain parameters once 
and then never change seem, for example:

|/*
  *  For application, e.g. set a new bitrate
  *    VABufferID buf_id;
  *    VAEncMiscParameterBuffer *misc_param;
  *    VAEncMiscParameterRateControl *misc_rate_ctrl;
  *
  *    vaCreateBuffer(dpy, context, VAEncMiscParameterBufferType,
  *              sizeof(VAEncMiscParameterBuffer) + sizeof(VAEncMiscParameterRateControl),
  *              1, NULL,&buf_id);
  *
  *    vaMapBuffer(dpy,buf_id,(void **)&misc_param);
  *    misc_param->type = VAEncMiscParameterTypeRateControl;
  *    misc_rate_ctrl= (VAEncMiscParameterRateControl *)misc_param->data;
  *    misc_rate_ctrl->bits_per_second = 6400000;
  *    vaUnmapBuffer(dpy, buf_id);
  *    vaRenderPicture(dpy, context,&buf_id, 1);
  */

|

For me that means a big simplification, because I don't need to check 
every frame if certain parameters has changed that would otherwise lead 
to a whole bunch of overhead like flushes or even the need to reset the 
whole GPU block and load new parameters etc....

> If the decode_bitstream interface is changed to get all bitstream buffers at the same time,
> there wouldn't be overhead to doing it like this. For a single picture it's supposed to stay constant,
> so for vdpau the sane way would be: set picture parameters for hardware (includes EVERYTHING),
> write all bitstream buffers to a hardware bo, wait until magic is done. Afaict, there isn't even a sane
> way to only submit partial buffers, so it's just a bunch of overhead for me.
>
> nvidia doesn't support va-api, it handles the entire process from picture parameters
> to a decoded buffer internally so it always convert the picture parameters into
> something the hardware can understand, every frame.
I'm not arguing against removing the scattered calls to 
decode_bitstream. I just don't want to lose information while passing 
the parameters from the state tracker down to the driver. But we can 
also add this information as a flag to the function later, so on a 
second thought that seems to be ok.

>>> - set_picture_parameters: Can be passed to decode_bitstream/macroblock
>> Same as above, take a look at the enum VABufferType  in the VAAPI specification (http://cgit.freedesktop.org/libva/tree/va/va.h).
>>> - set_decode_target: Idem
>>> - create_decode_buffer/set_decode_buffer/destroy_decode_buffer:
>>>       Even if a decoder wants it, the state tracker has no business knowing
>>>       about it.
>> Unfortunately that isn't true, only VDPAU hasn't a concept of giving the application a certain control over the buffers. XvMC has the implicit association of internal decode buffers with destination surfaces, and VAAPI gives the application both an one shot copy and a map/unmap interface to provide buffer content.
>>
>> When we want to avoid unnecessary coping around of buffer content then this interface has to become even more complicated, like new buffer types, using create user buffer, mapping/unmapping of to be decoded data buffers etc.
> There is no guarantee a user created video buffer can be used as reference frame, I honestly don't even know if it's possible.
Oh, I'm not talking about reference frames here, I'm talking about ALL 
kind of buffers. As far as I know using a manually uploaded video buffer 
as reference frame isn't possible with any of the major hardware 
vendors, and I can tell that it definitely doesn't work for Radeon hardware.

> Furthermore I don't see any code in state_trackers/va except some stubs, so I cant discuss an interface that's not even there to begin with.
The VA-API state tracker is only a stub, but the specification is fairly 
complete, and it was a merge requirement to support at least 
XvMC,VDPAU,VA-API and even DXVA with it.

>>> flush is changed to only flush a single pipe_video_buffer,
>>> this should reduce the state that needs to be maintained for XvMC otherwise.
>> I can't really see how is that improving the situation, all it does is reducing the interface to what is needed for VDPAU, but totally ignoring requirements for VAAPI for example.
> See above.
>
>>> Note: internally you can still use those calls, as long as the *public* api
>>> would be reduced to this, pipe_video_buffer is specific to each driver,
>>> so you can put in a lot of state there if you choose to do so. For example,
>>> quant matrix is not used in vc-1, and h264 will need a different
>>> way for set_reference_frames, see struct VdpReferenceFrameH264. This is why
>>> I want to remove those calls, and put them into picture_desc..
>> Correct, those calls need to be extended to support more codecs, but just stuffing everything into a single structure also doesn't seems to be a solution that will work for a wider view of different state trackers or hardware.
> All hardware has its own unique approach with their own api, so it would make sense if there is a separate call for decode_va too,
> with maybe some bits that tell what changed, so it only has to upload those specific parts..
Hui? Are you kidding? Gallium3Ds goal is to provide an unified and 
abstracted view of todays video hardware, and by introducing just a 
specific call to implement a single API we definitely don't work into 
that direction.

>>> struct pipe_video_buffer I'm less sure about, especially wrt interlacing
>>> and alignment it needs more thought. height and width are aligned to 64
>>> on nvidia, and requires a special tiling flag. The compositor will need
>>> to become aware of interlacing, to be able to play back interlaced videos.
>> Alignment isn't so much of an issue. Take a look at vl_video_buffer_create, it is already aligning the width/height to a power of two if the hardware needs that. So it is perfectly legal for a video buffer to be larger than the requested size, this is already tested with the current color conversion code and should work fine with both XvMC and VDPAU.
>>
>>> I honestly haven't read up on interlacing yet, and haven't found a video
>>> to test it with, so that part is just speculative crap, and might change
>>> when I find out more about it.
>> Regarding deinterlacing I'm also not really sure what we should do about it, but all interfaces (XvMC/VDPAU/VAAPI/DXVA) seems to treat surfaces as progressive. So I think the output of a decode call should already be in a progressive format (or can be at least handled as one for sampling).
> No, they present the surfaces as progressive, but that doesn't necessarily mean they are,
> nothing would prevent it from just seeing GetBitsYCbCr as a special (slow) case.
Yeah, exactly that was what I had in mind, but since we doesn't have a 
consensus about how the internal memory layout of the buffers really are 
don't win anything by making them a public interface.

> Considering how different the approach for vdpau and va-api is,
> wouldn't it just make sense to handle va-api separately then?
No I don't think so, the rest of the gallium interfaces are just made 
around mapping/unmapping of buffers with the help of transfer objects. 
This interface has proven to be flexible to implement a couple of 
different apis, so I don't really see a need to create another interface 
just for VDPAU.

>>> Testing interlaced  videos that decode correctly with nvidia vdpau would help
>>> a lot to figure out what the proper way to handle interlacing would be, so if
>>> someone has a bunch that play with nvidia accelerated vdpau&   mplayer correctly,
>>> could you please link them? ;)
>> I will try to look around some more, but at least for MPEG2 I haven't found any interlaced videos for testing also. I will leave you a note if I find something.
>>> /**
>>>    * output for decoding / input for displaying
>>>    */
>>> struct pipe_video_buffer
>>> {
>>>      struct pipe_context *context;
>>>
>>>      enum pipe_format buffer_format;
>>>      // Note: buffer_format may change as a result of put_bits, or call to decode_bitstream
>>>      // afaict there is no guarantee a buffer filled with put_bits can be used as reference
>>>      // frame to decode_bitstream
>> Altering the video buffer format as a result to an application putting video content manually into it is something very VDPAU specific. So I think it would be better to let the state tracker check if the new content matches what is in the buffer currently and if the need arise recreate the buffer with a new format.
> This is only an optimization, I did it for g3dvl because it's faster in the case where PutBitsYCbCr is used when
> mplayer is not using vdpau. Eventually all buffers will have the correct layout, and it ends up being a few
> memcpy's. For nvidia the format will have to stay NV12, so I can't change format in that case..
>
>>>      enum pipe_video_chroma_format chroma_format;
>>>      unsigned width;
>>>      unsigned height;
>>>
>>>      enum pipe_video_interlace_type layout;
>>>      // progressive
>>>      // even and odd lines are split
>>>      // interlaced, top field valid only (half height)
>>>      // interlaced, bottom field valid only
>>>      // I'm really drawing a blank what would be sane here, since interlacing has a ton of
>>>      // codec specific information, and current design doesn't handle it at all..
>> The concrete memory layout of the buffer is implementation specific, so I think things like interlacing and/or special tilling modes are something that the driver needs to handle and should not be exposed to the state tracker.
> I don't think so..
>
>   * \defgroup VdpVideoMixer VdpVideoMixer; Video Post-processing \
>   *           and Compositing object
>   *
>   * VdpVideoMixer can perform some subset of the following
>   * post-processing steps on video:
>   * - De-interlacing
>   *   - Various types, with or without inverse telecine
>   * - Noise-reduction
>   * - Sharpness adjustment
>   * - Color space conversion to RGB
>   * - Chroma format upscaling to 4:4:4
>
> Regardless of being va-api/vdpau/xvmc, those steps need to be performed, for all hardware.
> Deinterlacing shouldn't be done in hardware specific bits, and this step could also easily
> be changed to include frames that are split up too..
Wait a moment, you are mixing two different things here:

It's one thing to provide a top/bottom/planar view to use a video buffer 
as a source for a shader, that's what get_sampler_view_planes is good 
for (we still have to add that parameter to that function).

And it is a complete different thing to have a internal memory layout of 
separated top and bottom lines, because this layout is hardware 
specific, for example you could have all top lines first (luma and 
chroma) and the the bottom lines, or you can have luma top, then luma 
bottom, then chroma top then chroma bottom etc... Paired with different 
tilling modes that gives you probably a couple of hundred different 
memory layouts. It doesn't make sense to export this to a state tracker.

>>>      /**
>>>       * destroy this video buffer
>>>       */
>>>      void (*destroy)(struct pipe_video_buffer *buffer);
>>>
>>>      /**
>>>       * get a individual sampler view for each component
>>>       */
>>>      struct pipe_sampler_view **(*get_sampler_view_components)(struct pipe_video_buffer *buffer);
>>>      // Note: for progressive split in 2 halfs, would probably need up 6...
>>>      // top Y, bottom Y, top CbCr, bottom CbCr
>>>
>>>      /**
>>>       * get a individual surfaces for each plane
>>>       */
>>>      struct pipe_surface **(*get_surfaces)(struct pipe_video_buffer *buffer);
>>>
>>>      /**
>>>       * write bits to a video buffer, possibly altering the format of this video buffer
>>>       */
>>>      void (*put_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
>>>                       void const *const *source_data, uint32_t const *source_pitches);
>>>
>>>      /**
>>>       * read bits from a video buffer
>>>       */
>>>      void (*get_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
>>>                       void *const*source_data, uint32_t const *source_pitches);
>>> };
>> Having put_bits and get_bits implementations are actually a bad idea, because they assume that the state tracker has a pointer where data is stored. But some state trackers (especially VAAPI) needs an map/unmap like interface, where the driver is returning a pointer and pitch to some linear aligned memory. Like with tilling that sometimes requires the buffer to be copied into a linear memory layout, that was at least the way we handled it inside gallium so far.
>>
>> Over all it is good to know that somebody is still looking into it.
> No, it's a good idea if you have hardware that requires a specific layout in order to use the video buffer as a reference frame...
No, a buffer that was filled with put_bits isn't usable as a reference 
frame, so all you have todo is keeping an internal information about the 
buffer layout, and use that while creating the samplers, a decode 
operation on a buffer is setting the layout to whatever your decoder 
outputs, and mapping a buffer sets that layout to linear. That's at 
least how it is handled for tilling, and it seems to work fine there.

Regards,
Christian.


More information about the mesa-dev mailing list