[Mesa-dev] hardware xvmc video decoding with nouveau

Sun Aug 7 15:02:24 PDT 2011

Hey Younes,

On 08/06/2011 08:37 PM, Younes Manton wrote:
> 2011/7/31 Christian König <deathsimple at vodafone.de>:
>> Am Freitag, den 29.07.2011, 18:23 -0400 schrieb Younes Manton:
>>> On Fri, Jul 29, 2011 at 9:37 AM, Maarten Lankhorst
>>> <m.b.lankhorst at gmail.com> wrote:
>>>> With some help from the nouveau team I managed to get video acceleration
>>>> working for my nv96 card. The video buffer api works well enough for nouveau,
>>>> I added flags to vl_video_buffer_create_ex so I could force a linear surface
>>>> with a nouveau specific resource flag, which I only specified when hardware
>>>> that potentially supported hardware decoding was found. With the video
>>>> buffer API, I only needed to specify that and I could get it to work.
>>>> This made it easy for me, I only had to write code to talk to the decoder.
>> Adding the flag is just one way to work around the problem, as Younes
>> already said, if you have a special hardware layout of the buffer it is
>> probably better to write your own implementation of pipe_video_buffer.
>>
>> But if you only need to add the flag to be passed down to the driver and
>> everything else can stay as it is, go ahead and add it.
>>
>>>> The api for implementing the decoder I'm less happy about. I know this is
>>>> because there is no real support yet for other decoders, but I think
>>>> pipe_video_decode_buffer api is wrong right now. It assumes that the
>>>> state tracker knows enough about how the decoder wants to interpret the
>>>> macroblocks.
>> Yes indeed, Younes interface (just like xvmc) mixed the mc and idct data
>> together in one structure, I changed that because shader based decoding
>> (and the decoder found in the early radeon chipsets) work just like
>> that, one "todo" list for idct and another for mc. What I didn't realise
>> is that's this isn't the way nvidia hardware is working.
>>
>>>> The nouveau hardware decoder has to interpret it in it's own way, so that
>>>> makes it need a different api. I think the best thing would be to pass
>>>> information about the macroblock with a pointer to the data blocks,
>>>> and then let the decoder buffer decide how to interpret it. Also is it the
>>>> intention to only start decoding when XvMCPutSurface is called? If the
>>>> reference surfaces are passed, I can start decoding in XvMCRenderSurface.
>>>> I'd also like it if flush_buffer is removed, and instead the video buffers
>>>> are passed to end_frame.
>> Nope, I already tried to explain that to Younes, a big problem with xvmc
>> is that it assumes you want to decode one slice at a time, so there is
>> no really good way to abort rendering in the middle of a frame when the
>> hardware doesn't do it's command submission like this.
>>
>> So when a user starts to seek in a video you have to wait (with sync
>> surface) that a decoding process is done, before you can start decoding
>> at the new position, today apps like xine even destroy/recreate the
>> decoder completely because of problems with resetting the decoder
>> engines in the middle of a frame, leading to quite some lag while
>> seeking or switching channels.
>>
>> Just look at how DXVA/VDPAU does this, you get something like
>> begin/put/put/put.../end to setup a command buffer and then do a flush
>> to actually tell the hardware to execute this buffer asynchronously in
>> the background while the next frame is setup on the cpu. There are some
>> very good reasons that we abandoned the XvMC interface, and because of
>> this I don't think we should design the g2dvl interfaces around that.
>>
>> A compromise could be that we add the target surface as an optional
>> parameter to decode_macroblock and then let the hardware decide if it
>> want to start decoding earlier.
>>
>> By the way what is the actual benefit of this?
>>
>>>> Some of the methods to pipe_video_buffer also appear to be g3dvl specific,
>>>> so could it be split out?
>> Nope, the functions are there to support the fall-back to software
>> rendering for VDPAU. The reason that this is a bit problematic is that
>> you don't know if a buffer will be used for hw decoding or sw decoding
>> on buffer creation time. I already thought about a flag to
>> create_video_buffer and keep track if a buffer is used for hw or sw
>> rendering, but this code would then also needs to be replicated to the
>> vaapi state tracker, and I wanted to avoid this.
>>
>>> As for the changes required to support HW decoding, it was discussed
>>> in [1-3]. I have some patches in the works for that that I'll clean
>>> up, but the short story is that pipe_video_decode_buffer shouldn't
>>> exist in the state tracker.
>> Yeah, that's just another ugliness introduced to better support XvMC,
>> the real problem behind it is that xvmc ties the decode buffer to target
>> surfaces (for sync etc..), while the other interfaces either doesn't do
>> it like this (DXVA/VAAPI) or clearly distinct between render surface and
>> output surface (VDPAU). If you ask me it should be up to the driver to
>> handle it's buffers, but this unfortunately breaks some assumptions in
>> apps about how surfaces should be used (and become reuseable).
>>
>> Cheers,
>> Christian.
>>
>>
> The attached patch I believe should satisfy everyone's needs here. It
> removes the use of pipe_video_decode_buffer from the state tracker and
> moves it to the shader decoder. This lets every decoder parse the
> incoming macroblocks themselves so they can do whatever they need.
>
> Too big to inline, see attached. Tested on XvMC/VDPAU with softpipe
> and an RV620 with a couple of videos (this doesn't imply that any/all
> combinations produced correct output, just that it was the same
> before/after).
I've updated my git tree to include your patch. Everything in it can be pulled now. :)
Vdpau will be broken for nouveau, unless you manually fallback with XVMC_VL=1.
With your patch I had no way of knowing if the video buffer was created for xvmc
or vdpau.

~Maarten