[Mesa-dev] hardware xvmc video decoding with nouveau

Sun Jul 31 05:52:45 PDT 2011

Am Freitag, den 29.07.2011, 18:23 -0400 schrieb Younes Manton:
> On Fri, Jul 29, 2011 at 9:37 AM, Maarten Lankhorst
> <m.b.lankhorst at gmail.com> wrote:
> > With some help from the nouveau team I managed to get video acceleration
> > working for my nv96 card. The video buffer api works well enough for nouveau,
> > I added flags to vl_video_buffer_create_ex so I could force a linear surface
> > with a nouveau specific resource flag, which I only specified when hardware
> > that potentially supported hardware decoding was found. With the video
> > buffer API, I only needed to specify that and I could get it to work.
> > This made it easy for me, I only had to write code to talk to the decoder.
Adding the flag is just one way to work around the problem, as Younes
already said, if you have a special hardware layout of the buffer it is
probably better to write your own implementation of pipe_video_buffer.

But if you only need to add the flag to be passed down to the driver and
everything else can stay as it is, go ahead and add it.

> > The api for implementing the decoder I'm less happy about. I know this is
> > because there is no real support yet for other decoders, but I think
> > pipe_video_decode_buffer api is wrong right now. It assumes that the
> > state tracker knows enough about how the decoder wants to interpret the
> > macroblocks.
Yes indeed, Younes interface (just like xvmc) mixed the mc and idct data
together in one structure, I changed that because shader based decoding
(and the decoder found in the early radeon chipsets) work just like
that, one "todo" list for idct and another for mc. What I didn't realise
is that's this isn't the way nvidia hardware is working.

> > The nouveau hardware decoder has to interpret it in it's own way, so that
> > makes it need a different api. I think the best thing would be to pass
> > information about the macroblock with a pointer to the data blocks,
> > and then let the decoder buffer decide how to interpret it. Also is it the
> > intention to only start decoding when XvMCPutSurface is called? If the
> > reference surfaces are passed, I can start decoding in XvMCRenderSurface.
> > I'd also like it if flush_buffer is removed, and instead the video buffers
> > are passed to end_frame.
Nope, I already tried to explain that to Younes, a big problem with xvmc
is that it assumes you want to decode one slice at a time, so there is
no really good way to abort rendering in the middle of a frame when the
hardware doesn't do it's command submission like this.

So when a user starts to seek in a video you have to wait (with sync
surface) that a decoding process is done, before you can start decoding
at the new position, today apps like xine even destroy/recreate the
decoder completely because of problems with resetting the decoder
engines in the middle of a frame, leading to quite some lag while
seeking or switching channels.

Just look at how DXVA/VDPAU does this, you get something like
begin/put/put/put.../end to setup a command buffer and then do a flush
to actually tell the hardware to execute this buffer asynchronously in
the background while the next frame is setup on the cpu. There are some
very good reasons that we abandoned the XvMC interface, and because of
this I don't think we should design the g2dvl interfaces around that.

A compromise could be that we add the target surface as an optional
parameter to decode_macroblock and then let the hardware decide if it
want to start decoding earlier.

By the way what is the actual benefit of this?

> > Some of the methods to pipe_video_buffer also appear to be g3dvl specific,
> > so could it be split out?
Nope, the functions are there to support the fall-back to software
rendering for VDPAU. The reason that this is a bit problematic is that
you don't know if a buffer will be used for hw decoding or sw decoding
on buffer creation time. I already thought about a flag to
create_video_buffer and keep track if a buffer is used for hw or sw
rendering, but this code would then also needs to be replicated to the
vaapi state tracker, and I wanted to avoid this.

> As for the changes required to support HW decoding, it was discussed
> in [1-3]. I have some patches in the works for that that I'll clean
> up, but the short story is that pipe_video_decode_buffer shouldn't
> exist in the state tracker.
Yeah, that's just another ugliness introduced to better support XvMC,
the real problem behind it is that xvmc ties the decode buffer to target
surfaces (for sync etc..), while the other interfaces either doesn't do
it like this (DXVA/VAAPI) or clearly distinct between render surface and
output surface (VDPAU). If you ask me it should be up to the driver to
handle it's buffers, but this unfortunately breaks some assumptions in
apps about how surfaces should be used (and become reuseable). 

Cheers,
Christian.