[Mesa-dev] hardware xvmc video decoding with nouveau

Tue Aug 16 12:26:54 PDT 2011

On 08/16/2011 02:10 PM, Christian König wrote:
> Am Dienstag, den 16.08.2011, 01:15 -0400 schrieb Younes Manton:
>> Anyway, here are some specific comments:
>>
>> +   for (; num_macroblocks > 0; --num_macroblocks) {
>> +      mb->base.codec = PIPE_VIDEO_CODEC_MPEG12;
>> +      mb->macroblock_address = xvmc_mb->x +
>> context->width_in_macroblocks * xvmc_mb->y;
>> +      mb->macroblock_type = xvmc_mb->macroblock_type;
>>
>> Why macroblock_address? A decoder that prefers x/y will have to do a
>> div/mod to get it back, so it's better to just pass x/y along and let
>> each decoder do what it wants with it.
> I also need x/y for shaders, but the macroblock_address is defined in
> the spec and so a (new/better) bitstream decoder could just pass the
> value found in the bitstream into the field, instead of needing to
> calculate the x/y value.
>
> If you say that the Nvidia HW needs this also as x/y I'm also quite
> happy with passing x/y directly in the structure.
afaict, it needs x/y
>> +   void (*begin_frame)(struct pipe_video_decoder *decoder);
>> +   void (*end_frame)(struct pipe_video_decoder *decoder);
>>
>> The Nvidia HW decoder doesn't need these, but I think
>> begin_frame/end_frame are not as useful as they can be because you
>> don't always know which video buffers they will apply to. When you
>> call this in RecursiveEndFrame() it can result in 0, 1, 2, or 3
>> end_frame() calls, but which video buffers does it apply to? The 0 and
>> 3 case are obvious, but 1 or 2 is ambigious if the current video
>> buffer has 1 or 2 reference frames.
> Take a look at "SetDecoderStatus", it should restores the correct
> decoder state before calling begin/decode/end, so the video buffer,
> reference frames and picture info should be available when the call into
> the driver happens.
>
> I thought about making the decode buffer, target and reference frame
> parameters to begin/decode/end instead of state like stuff, to make the
> fact that you need to set a correct environment before calling any of
> this function obviously. But then it would also make sense to pass
> picture info and quant matrix as parameters, and in case of quant matrix
> this doesn't make much sense, because it isn't needed for MC/IDCT only
> stages. 
>
> I think the question is a bit where to draw the line between state like
> information and information passed in parameters.
>
>> Also, based on the above, I think there is a bug in
>> vl_mpeg12_end_frame(struct pipe_video_decoder *decoder). It will unmap
>> and destroy the tex_transfers attached to the current buf, but what
>> about the reference frames? It has the same problem, it doesn't know
>> about the reference frames when end_frame is called so it operates on
>> the same buf multiple times in RecursiveEndFrame().
> While developing the patch I actually had the problem that end_frame was
> called multiple times for the same frame or called for a wrong buffer.
> But I think I have ruled out all corner cases with restoring the state
> in "SetDecoderStatus", at least mplayer doesn't crash any more.
>
>> +      vldecoder->cur_buffer %= vldecoder->num_buffers;
>> +
>> +      vldecoder->decoder->set_decode_buffer(vldecoder->decoder,
>> vldecoder->buffers[vldecoder->cur_buffer]);
>> +      vldecoder->decoder->set_decode_target(vldecoder->decoder,
>> vlsurf->video_buffer);
>> +
>> +      return vlVdpDecoderRenderMpeg12(vldecoder->decoder,
>> (VdpPictureInfoMPEG1Or2 *)picture_info,
>> +                                      bitstream_buffer_count,
>> bitstream_buffers);
>>
>> Not sure I understand why this is necessary. Why do you need to use a
>> different decode buffer for each frame? In the XvMC case you want to
>> keep data for each target video buffer seperate, because the XvMC API
>> allows you to submit macroblocks to any surface at any time. Why is
>> this necessary for VDPAU if it requires the user to submit the entire
>> frame at once?
> The short answer is: I doesn't need one for every frame. What I need is
> plain old round robin (tripple) buffering.
>
> The kernel makes pretty sure that a buffer isn't accessed from user mode
> when the hardware is accessing it. So what happens when you try to map a
> buffer again shortly after you passed it to the hardware is that the
> kernel is blocking your user mode process. In the end instead of a
> constant flow of commands and everybody (CPU/GPU) busy all the time, you
> get peaks of work on both of them.
>
> I added a capability to specify how many buffers should be used round
> robin, but the same as with begin/end/flush applies here: If you don't
> need it don't use it. OpenGL (or more specific the pipe_context
> interface) offers at least three different ways of handling this:
>
> 1. Let the process block until the buffer is unused, that usually
> happens when you don't take any otherwise care.
>
> 2. Specify the DONT_BLOCK flag top the map function, if the buffer isn't
> ready it isn't mapped and the map function returns a NULL pointer. With
> this the state tracker can react accordingly (allocate a new buffer, say
> "I'm busy" to the application etc...).
>
> 3. Call the flush function and use the returned fence to figure out if
> all scheduled operations are already done and the buffer can be reused.
>
> I don't think we need a so complex interface for video decoding, but all
> the current interfaces seems to have one way or another to ask the
> driver if an operation is completed and a buffer/surface can be reused,
> so at least a combination of flush/fence seems necessary here.
>
>> Those are the issues that I noticed at first. Martin may have some
>> comments specific to his work with VPE. What do you think?
> I also started to reimplement the bitstream parser this morning, Maarten
> also stated that he is interesting to do this, but I don't know how far
> he got. It doesn't make much sense if we work on the same at the same
> time, so what you are currently doing?
>
I have a crappy one now, but it's only compile tested. If you want
to verify that it's correct, please!! :) The file wasn't based on
any (L)GPL implementation, but on my reading of the spec and the
vlc tables from mwk's stubby parser, it's available at
http://repo.or.cz/w/mesa/nouveau-pmpeg.git/
as src/gallium/drivers/nouveau/nouveau_pmpeg_bitstream.[ch]

The header contains the tables. The mpeg specification is dense
reading, so if you can verify its correctness, please! :)
vlc b15 is still missing, and I'm wondering why 1 is
substracted from f_code in vdpau/decode.c, it doesn't really
make sense to do it there..

At the moment I don't care about speed, I want it correct first,
then I can optimize it.

~Maarten