[Mesa-dev] hardware xvmc video decoding with nouveau

Tue Aug 16 05:10:36 PDT 2011

Am Dienstag, den 16.08.2011, 01:15 -0400 schrieb Younes Manton:
> Anyway, here are some specific comments:
> 
> +   for (; num_macroblocks > 0; --num_macroblocks) {
> +      mb->base.codec = PIPE_VIDEO_CODEC_MPEG12;
> +      mb->macroblock_address = xvmc_mb->x +
> context->width_in_macroblocks * xvmc_mb->y;
> +      mb->macroblock_type = xvmc_mb->macroblock_type;
> 
> Why macroblock_address? A decoder that prefers x/y will have to do a
> div/mod to get it back, so it's better to just pass x/y along and let
> each decoder do what it wants with it.
I also need x/y for shaders, but the macroblock_address is defined in
the spec and so a (new/better) bitstream decoder could just pass the
value found in the bitstream into the field, instead of needing to
calculate the x/y value.

If you say that the Nvidia HW needs this also as x/y I'm also quite
happy with passing x/y directly in the structure.

> +   void (*begin_frame)(struct pipe_video_decoder *decoder);
> +   void (*end_frame)(struct pipe_video_decoder *decoder);
> 
> The Nvidia HW decoder doesn't need these, but I think
> begin_frame/end_frame are not as useful as they can be because you
> don't always know which video buffers they will apply to. When you
> call this in RecursiveEndFrame() it can result in 0, 1, 2, or 3
> end_frame() calls, but which video buffers does it apply to? The 0 and
> 3 case are obvious, but 1 or 2 is ambigious if the current video
> buffer has 1 or 2 reference frames.
Take a look at "SetDecoderStatus", it should restores the correct
decoder state before calling begin/decode/end, so the video buffer,
reference frames and picture info should be available when the call into
the driver happens.

I thought about making the decode buffer, target and reference frame
parameters to begin/decode/end instead of state like stuff, to make the
fact that you need to set a correct environment before calling any of
this function obviously. But then it would also make sense to pass
picture info and quant matrix as parameters, and in case of quant matrix
this doesn't make much sense, because it isn't needed for MC/IDCT only
stages. 

I think the question is a bit where to draw the line between state like
information and information passed in parameters.

> Also, based on the above, I think there is a bug in
> vl_mpeg12_end_frame(struct pipe_video_decoder *decoder). It will unmap
> and destroy the tex_transfers attached to the current buf, but what
> about the reference frames? It has the same problem, it doesn't know
> about the reference frames when end_frame is called so it operates on
> the same buf multiple times in RecursiveEndFrame().
While developing the patch I actually had the problem that end_frame was
called multiple times for the same frame or called for a wrong buffer.
But I think I have ruled out all corner cases with restoring the state
in "SetDecoderStatus", at least mplayer doesn't crash any more.

> +      vldecoder->cur_buffer %= vldecoder->num_buffers;
> +
> +      vldecoder->decoder->set_decode_buffer(vldecoder->decoder,
> vldecoder->buffers[vldecoder->cur_buffer]);
> +      vldecoder->decoder->set_decode_target(vldecoder->decoder,
> vlsurf->video_buffer);
> +
> +      return vlVdpDecoderRenderMpeg12(vldecoder->decoder,
> (VdpPictureInfoMPEG1Or2 *)picture_info,
> +                                      bitstream_buffer_count,
> bitstream_buffers);
> 
> Not sure I understand why this is necessary. Why do you need to use a
> different decode buffer for each frame? In the XvMC case you want to
> keep data for each target video buffer seperate, because the XvMC API
> allows you to submit macroblocks to any surface at any time. Why is
> this necessary for VDPAU if it requires the user to submit the entire
> frame at once?
The short answer is: I doesn't need one for every frame. What I need is
plain old round robin (tripple) buffering.

The kernel makes pretty sure that a buffer isn't accessed from user mode
when the hardware is accessing it. So what happens when you try to map a
buffer again shortly after you passed it to the hardware is that the
kernel is blocking your user mode process. In the end instead of a
constant flow of commands and everybody (CPU/GPU) busy all the time, you
get peaks of work on both of them.

I added a capability to specify how many buffers should be used round
robin, but the same as with begin/end/flush applies here: If you don't
need it don't use it. OpenGL (or more specific the pipe_context
interface) offers at least three different ways of handling this:

1. Let the process block until the buffer is unused, that usually
happens when you don't take any otherwise care.

2. Specify the DONT_BLOCK flag top the map function, if the buffer isn't
ready it isn't mapped and the map function returns a NULL pointer. With
this the state tracker can react accordingly (allocate a new buffer, say
"I'm busy" to the application etc...).

3. Call the flush function and use the returned fence to figure out if
all scheduled operations are already done and the buffer can be reused.

I don't think we need a so complex interface for video decoding, but all
the current interfaces seems to have one way or another to ask the
driver if an operation is completed and a buffer/surface can be reused,
so at least a combination of flush/fence seems necessary here.

> Those are the issues that I noticed at first. Martin may have some
> comments specific to his work with VPE. What do you think?
I also started to reimplement the bitstream parser this morning, Maarten
also stated that he is interesting to do this, but I don't know how far
he got. It doesn't make much sense if we work on the same at the same
time, so what you are currently doing?

Christian.