[Mesa-dev] 10bit HEVC decoding for RadeonSI

Fri Jan 27 12:51:22 UTC 2017

On 26/01/17 16:59, Christian König wrote:
> Am 26.01.2017 um 13:14 schrieb Mark Thompson:
>> On 26/01/17 11:00, Christian König wrote:
>>> Hi Peter,
>>>
>>> Am 25.01.2017 um 19:45 schrieb Peter Frühberger:
>>>>
>>>>      Peter, Rainer any idea what I'm missing here? Do you guys use some
>>>>      modified ffmpeg for Kodi or how does that work for you?
>>>>
>>>>
>>>> do you set the format correctly, e.g.: https://github.com/FernetMenta/kodi-agile/blob/master/xbmc/cores/VideoPlayer/DVDCodecs/Video/VAAPI.cpp#L2697 to create the surfaces?
>>> Well the problem here is that the VA-API interface is not consistent and I'm not sure how to implement it correctly.
>>>
>>> See your code for example:
>>>> VASurfaceAttrib attribs[1], *attrib;
>>>>
>>>> attrib = attribs;
>>>>
>>>> attrib->flags = VA_SURFACE_ATTRIB_SETTABLE;
>>>>
>>>> attrib->type = VASurfaceAttribPixelFormat;
>>>>
>>>> attrib->value.type = VAGenericValueTypeInteger;
>>>>
>>>> attrib->value.value.i = VA_FOURCC_NV12;
>>>>
>>>>
>>> First Kodi specifies that NV12 should be used which implies that this is a 8bit surface.
>>>
>>>> // create surfaces
>>>>
>>>> VASurfaceID surfaces[32];
>>>>
>>>> unsigned int format = VA_RT_FORMAT_YUV420;
>>>>
>>>> if (m_config.profile == VAProfileHEVCMain10)
>>>>
>>>> format = VA_RT_FORMAT_YUV420_10BPP;
>>> But then Kodi requests a 10bit surface. Now what is the correct thing to do here?
>>>
>>> I can either create an NV12 surface, which would be 8bit but would result in either an error message or only 8bit dithering during decode.
>>>
>>> Or I can promote the surface to 10bit, which would result in a P010 or rather P016 format.
>>>
>>> Or and that is actually what I think would be best the VA-API driver should trow an error indicating that the application requested something impossible.
>> I prefer the last.  IMO that code is just wrong - you can't specify an 8-bit format for a 10-bit surface.  (I'm not really sure what it's trying to do, I would have expected it to barf with the Intel driver as well, which doesn't have any dithering support so Main10 video must be decoded to P010 surfaces.)
>>
>>>> afterwards we just do drm / egl interop, via:
>>>> https://github.com/FernetMenta/kodi-agile/blob/master/xbmc/cores/VideoPlayer/DVDCodecs/Video/VAAPI.cpp#L1374
>>> I'm not sure if that will ever work correctly. The problem is that VA-API leaks to the application what the data layout in the surface is. As soon as we turn on tilling that will only work with rather crude hacks.
>>>
>>> I will try to get it working, but probably need help from you guys as well.
>> I don't think tiling should be relevant, but do correct me if this has more issues than it does on Intel.
> 
> The problem here is I need to know what will be done with the surface from the very beginning. E.g. if you want to scan it out directly to hardware you need a different memory layout as when you just want to sample from it.
> 
> Same applies to sampling from it from OpenGL and in this case also how you want to sample from it etc etc...

For use in other places (like scanout or OpenGL), is this a correctness issue (things just won't work with the wrong layout) or a performance one (things will be slower or use more resources)?

(For that matter, is there a list somewhere of the set of formats/layouts and what they are used for?)

>> To my mind, the PixelFormat attribute (fourcc) is only specifying the appearance of the format from the point of view of the user.  That is, what you will get if you call vaDeriveImage() and then map the result.
> 
> And exactly that is completely nonsense. You CAN'T assume that the decoding result is immediately accessible by the CPU.
> 
> So the only proper way of handling this is going the VDPAU design. You create the surface without specifying any format, decode into it with the decoder and then the application tells the driver what format it wants to access it.
> 
> The driver then copies the data to CPU accessible memory and does the conversion to the format desired by the application on the fly.
> 
>>    Tiling then doesn't cause any problems because it is a property of the DRM object and mapping can automagically take it into account.
> 
> No it can't, tiled surfaces are not meant to be CPU accessible. So the whole idea of mapping a surface doesn't make much sense to me.

If they aren't CPU accessible, doesn't this mean that the layout of the surfaces isn't observable by the user and therefore doesn't matter to the API?  Since the user can't access the surface directly, it can be whatever is most suitable for the hardware and the user can't tell.  The API certainly admits the possibility that vaDeriveImage() just can't expose surfaces to the CPU directly, or that there are extra implicit copies so that it all appears consistent from the point of view of the user.

I think my use of the word "mapping" wasn't helping there: I was using it to refer both to mapping to the CPU address space (which need not be supported) and to other APIs (OpenGL, OpenCL, whatever) which will use it on the GPU (which is far more important).  My real question on the tiling issue was: is tiling/layout/whatever a properly of the DRM object, such that other APIs interacting with it can do the right thing without the user needing to know about it?  If not, then the VAAPI buffer-sharing construction (vaCreateSurfaces() and vaAcquireBuffer() with VA_SURFACE_ATTRIB_TYPE_DRM_PRIME) doesn't contain enough information to ever work and we should be trying to fix it in the API.  If so, then these issues seem resolvable, albeit possibly with extra copies in places where we haven't been able to work out far enough in advance what was needed (or where the user really does want to do multiple things which require different formats: maybe drawing something on a surface (OSD/subtitles, say) and then sending it to an encoder or scanout).

Thanks,

- Mark