[Mesa-dev] hardware xvmc video decoding with nouveau

Sat Aug 6 11:37:15 PDT 2011

2011/7/31 Christian König <deathsimple at vodafone.de>:
> Am Freitag, den 29.07.2011, 18:23 -0400 schrieb Younes Manton:
>> On Fri, Jul 29, 2011 at 9:37 AM, Maarten Lankhorst
>> <m.b.lankhorst at gmail.com> wrote:
>> > With some help from the nouveau team I managed to get video acceleration
>> > working for my nv96 card. The video buffer api works well enough for nouveau,
>> > I added flags to vl_video_buffer_create_ex so I could force a linear surface
>> > with a nouveau specific resource flag, which I only specified when hardware
>> > that potentially supported hardware decoding was found. With the video
>> > buffer API, I only needed to specify that and I could get it to work.
>> > This made it easy for me, I only had to write code to talk to the decoder.
> Adding the flag is just one way to work around the problem, as Younes
> already said, if you have a special hardware layout of the buffer it is
> probably better to write your own implementation of pipe_video_buffer.
>
> But if you only need to add the flag to be passed down to the driver and
> everything else can stay as it is, go ahead and add it.
>
>> > The api for implementing the decoder I'm less happy about. I know this is
>> > because there is no real support yet for other decoders, but I think
>> > pipe_video_decode_buffer api is wrong right now. It assumes that the
>> > state tracker knows enough about how the decoder wants to interpret the
>> > macroblocks.
> Yes indeed, Younes interface (just like xvmc) mixed the mc and idct data
> together in one structure, I changed that because shader based decoding
> (and the decoder found in the early radeon chipsets) work just like
> that, one "todo" list for idct and another for mc. What I didn't realise
> is that's this isn't the way nvidia hardware is working.
>
>> > The nouveau hardware decoder has to interpret it in it's own way, so that
>> > makes it need a different api. I think the best thing would be to pass
>> > information about the macroblock with a pointer to the data blocks,
>> > and then let the decoder buffer decide how to interpret it. Also is it the
>> > intention to only start decoding when XvMCPutSurface is called? If the
>> > reference surfaces are passed, I can start decoding in XvMCRenderSurface.
>> > I'd also like it if flush_buffer is removed, and instead the video buffers
>> > are passed to end_frame.
> Nope, I already tried to explain that to Younes, a big problem with xvmc
> is that it assumes you want to decode one slice at a time, so there is
> no really good way to abort rendering in the middle of a frame when the
> hardware doesn't do it's command submission like this.
>
> So when a user starts to seek in a video you have to wait (with sync
> surface) that a decoding process is done, before you can start decoding
> at the new position, today apps like xine even destroy/recreate the
> decoder completely because of problems with resetting the decoder
> engines in the middle of a frame, leading to quite some lag while
> seeking or switching channels.
>
> Just look at how DXVA/VDPAU does this, you get something like
> begin/put/put/put.../end to setup a command buffer and then do a flush
> to actually tell the hardware to execute this buffer asynchronously in
> the background while the next frame is setup on the cpu. There are some
> very good reasons that we abandoned the XvMC interface, and because of
> this I don't think we should design the g2dvl interfaces around that.
>
> A compromise could be that we add the target surface as an optional
> parameter to decode_macroblock and then let the hardware decide if it
> want to start decoding earlier.
>
> By the way what is the actual benefit of this?
>
>> > Some of the methods to pipe_video_buffer also appear to be g3dvl specific,
>> > so could it be split out?
> Nope, the functions are there to support the fall-back to software
> rendering for VDPAU. The reason that this is a bit problematic is that
> you don't know if a buffer will be used for hw decoding or sw decoding
> on buffer creation time. I already thought about a flag to
> create_video_buffer and keep track if a buffer is used for hw or sw
> rendering, but this code would then also needs to be replicated to the
> vaapi state tracker, and I wanted to avoid this.
>
>> As for the changes required to support HW decoding, it was discussed
>> in [1-3]. I have some patches in the works for that that I'll clean
>> up, but the short story is that pipe_video_decode_buffer shouldn't
>> exist in the state tracker.
> Yeah, that's just another ugliness introduced to better support XvMC,
> the real problem behind it is that xvmc ties the decode buffer to target
> surfaces (for sync etc..), while the other interfaces either doesn't do
> it like this (DXVA/VAAPI) or clearly distinct between render surface and
> output surface (VDPAU). If you ask me it should be up to the driver to
> handle it's buffers, but this unfortunately breaks some assumptions in
> apps about how surfaces should be used (and become reuseable).
>
> Cheers,
> Christian.
>
>

The attached patch I believe should satisfy everyone's needs here. It
removes the use of pipe_video_decode_buffer from the state tracker and
moves it to the shader decoder. This lets every decoder parse the
incoming macroblocks themselves so they can do whatever they need.

Too big to inline, see attached. Tested on XvMC/VDPAU with softpipe
and an RV620 with a couple of videos (this doesn't imply that any/all
combinations produced correct output, just that it was the same
before/after).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-g3dvl-Revert-to-more-general-decoding-hooks.patch
Type: text/x-patch
Size: 86638 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20110806/22a5983e/attachment-0001.bin>