[Mesa-dev] reworking pipe_video_decoder / pipe_video_buffer

Tue Nov 22 16:06:19 PST 2011

On 11/22/2011 10:00 PM, Younes Manton wrote:
> 2011/11/21 Christian König <deathsimple at vodafone.de>:
>> On 16.11.2011 15:38, Maarten Lankhorst wrote:
>>> If the decode_bitstream interface is changed to get all bitstream buffers
>>> at the same time,
>>> there wouldn't be overhead to doing it like this. For a single picture
>>> it's supposed to stay constant,
>>> so for vdpau the sane way would be: set picture parameters for hardware
>>> (includes EVERYTHING),
>>> write all bitstream buffers to a hardware bo, wait until magic is done.
>>> Afaict, there isn't even a sane
>>> way to only submit partial buffers, so it's just a bunch of overhead for
>>> me.
>>>
>>> nvidia doesn't support va-api, it handles the entire process from picture
>>> parameters
>>> to a decoded buffer internally so it always convert the picture parameters
>>> into
>>> something the hardware can understand, every frame.
>> I'm not arguing against removing the scattered calls to decode_bitstream. I
>> just don't want to lose information while passing the parameters from the
>> state tracker down to the driver. But we can also add this information as a
>> flag to the function later, so on a second thought that seems to be ok.
>>
> I don't have a comment on the rest, but on this part let me point out
> that it's valid for a VDPAU client to pass you, for a bitstream of
> size N bytes:
>
> * 1 buffer of size N
> * N buffers of size 1
> * any combination in between
>
> The only thing you're assured of, as far as I can tell, is that you
> have a complete picture across all the buffers. So, having the state
> tracker pass one buffer at a time to the driver can get ugly for
> everyone if a client decides to chop up the picture bitstream in an
> arbitrary manner.
This is actually what happens when you use mplayer, it passes the slices
and the start code for each slice separately. Presumably because it is
doing its own parsing. So you end up with:
{{ 0, 0, 1 }, { 0x25, .... }, { 0, 0, 1, }, { 0x25, ... } }

There's no way to make sense of that with just a single buffer at a time.

Also I suppose the put_bits patch is irrelevant now, with a dumb
fragment shader the even/odd lines can be merged. Had some help of
Christoph Bumiller. All the support code you need just to run a simple
frag shader is depressing. :-)

It still doesn't mean putbits can be used on a reference frame though..
I'm tracking everything that can be used as reference frame internally.
The state for B-frames (and equivalents for other codecs) gets dropped
immediately, but for others I keep track of the last max_references
buffers and currently tossing an assertion error if you try to use a
reference frame that would no longer count as such.. Least recently
used reference frame gets dropped. It scales linearly, but there's a hard
limit of 2 reference frames for non-h264 codecs, and 16 for h264, so
fixing that would have been overengineering, since even then I never
need more than 16 ref buffers and a buffer for the current target.
Raw image data wouldn't have been usable as reference regardless:
nvidia runs some kind of post processing step over the target image,
but keeps the original image to decode other frames with.

Cheers,
Maarten