[Mesa-dev] 10bit HEVC decoding for RadeonSI

Mark Thompson sw at jkqxz.net
Fri Jan 27 19:44:04 UTC 2017


On 27/01/17 14:27, Christian König wrote:
> Am 27.01.2017 um 13:51 schrieb Mark Thompson:
>> On 26/01/17 16:59, Christian König wrote:
>>> Am 26.01.2017 um 13:14 schrieb Mark Thompson:
>>>> [SNIP]
>>> The problem here is I need to know what will be done with the surface from the very beginning. E.g. if you want to scan it out directly to hardware you need a different memory layout as when you just want to sample from it.
>>>
>>> Same applies to sampling from it from OpenGL and in this case also how you want to sample from it etc etc...
>> For use in other places (like scanout or OpenGL), is this a correctness issue (things just won't work with the wrong layout) or a performance one (things will be slower or use more resources)?
> 
> It is a correctness issue, cause only a subset of the formats are directly usable with certain sampling modes or scanout for example.
> 
>> (For that matter, is there a list somewhere of the set of formats/layouts and what they are used for?)
> 
> Well taking a look at all the use cases and options you can program into the decoder you easily end up with 100+ format combinations.
> 
> For example the decoder can split a frame into the top and bottom field during decode, apply different tiling modes and layout the YUV data either packet or in separate planes.

Maybe I should have phrased that as "how many might sane people actually use?" :)  But yes, I can see that this is going to be more than just two or three such that you do need to treat it generically.

>>>> To my mind, the PixelFormat attribute (fourcc) is only specifying the appearance of the format from the point of view of the user.  That is, what you will get if you call vaDeriveImage() and then map the result.
>>> And exactly that is completely nonsense. You CAN'T assume that the decoding result is immediately accessible by the CPU.
>>>
>>> So the only proper way of handling this is going the VDPAU design. You create the surface without specifying any format, decode into it with the decoder and then the application tells the driver what format it wants to access it.
>>>
>>> The driver then copies the data to CPU accessible memory and does the conversion to the format desired by the application on the fly.
>>>
>>>>     Tiling then doesn't cause any problems because it is a property of the DRM object and mapping can automagically take it into account.
>>> No it can't, tiled surfaces are not meant to be CPU accessible. So the whole idea of mapping a surface doesn't make much sense to me.
>> If they aren't CPU accessible, doesn't this mean that the layout of the surfaces isn't observable by the user and therefore doesn't matter to the API?
> 
> Exactly, but the problem is that VA-API suggests otherwise.
> 
> E.g. we had it multiple times that customers coded a specialized application, tested them on Intel hardware and then wanted to have it working on AMD in exactly the same way.
> 
> That unfortunately didn't worked because our internal memory layout is just completely different.
> 
>> Since the user can't access the surface directly, it can be whatever is most suitable for the hardware and the user can't tell.
> 
> That won't work. An example we ran into was that a customer wanted to black out the first and last line of an image for BOB deinterlacing.
> 
> To do so he mapped the surface and just used memset() on the appropriate addresses. On Intel I was told this works because the mapping seems to be bidirectional and all changes done to it are reflected in the original video surface/image.

Tbh I think both of these examples are user error, if unfortunately understandable in the circumstances.  The API is clear that the direct mapping via vaDeriveImage() doesn't necessarily work:

<https://cgit.freedesktop.org/libva/tree/va/va.h#n2719>
"""
                               When the operation is not possible this interface
 * will return VA_STATUS_ERROR_OPERATION_FAILED. Clients should then fall back
 * to using vaCreateImage + vaPutImage to accomplish the same task in an
 * indirect manner.
"""

and therefore that you need code which looks like <https://git.libav.org/?p=libav.git;a=blob;f=libavutil/hwcontext_vaapi.c;h=b2e212c1fe518f310576ea14125266fbd5e7ce48;hb=HEAD#l720> to do any sort of CPU mapping reliably.  The indirect path does indeed work on AMD now (and actually is often what you want on Intel as well if you aren't anticipating the weird properties of the direct mapping).

I admit that this isn't necessarily a useful thing to tell your customers when their programs don't work.  Nor is it helpful that the libva examples don't follow it: <https://cgit.freedesktop.org/libva/tree/test/loadsurface.h#n314>.

>> The API certainly admits the possibility that vaDeriveImage() just can't expose surfaces to the CPU directly, or that there are extra implicit copies so that it all appears consistent from the point of view of the user.
> 
> Yeah, but it doesn't enforce that. E.g. you don't have it properly defined that the derived Image is a copy and it looks like that on Intel it just works so people tend to use it.

True.  On Intel it is never copying for vaMapBuffer() or vaAcquireBuffer() so writes by the CPU or other GPU operations are reflected immediately (and that goes as far as letting you scribble on decoder reference images while they are being used and breaking everything if you want).  None of that is codified anywhere, though.

>> I think my use of the word "mapping" wasn't helping there: I was using it to refer both to mapping to the CPU address space (which need not be supported) and to other APIs (OpenGL, OpenCL, whatever) which will use it on the GPU (which is far more important).  My real question on the tiling issue was: is tiling/layout/whatever a properly of the DRM object, such that other APIs interacting with it can do the right thing without the user needing to know about it?
> 
> No, they can't. For example when we import the UV plane of an NV12 surface into OpenGL then OpenGL needs to know that this originally comes from an decoded image. Otherwise it won't use the correct tilling paramters.
> 
> Currently we make an educated guess based on the offset what is about to be imported in the GL driver, but that isn't really good engineering.
> 
> To be fair I've ran into the exactly same problem with VDPAU as well, that's the reason we currently have tilling disabled for video surfaces there as well.

Right, the reason Intel mostly manages this is that they carry the information via private DRM calls (drm_intel_bo_get_tiling()...) and then add extra copies in some receivers to overcome problem cases (and you can still find holes in the abstraction where it doesn't work).  Also there are many fewer cases, which helps a lot.

>> If not, then the VAAPI buffer-sharing construction (vaCreateSurfaces() and vaAcquireBuffer() with VA_SURFACE_ATTRIB_TYPE_DRM_PRIME) doesn't contain enough information to ever work and we should be trying to fix it in the API.
> 
> Those problems won't resolve with just changes to VA-API. Essentially we need to follow NVidias proposal for the "Unix Device Memory Allocation project".
> 
> Hopefully this should in the end result in a library which based on the use case allow drivers to negotiate the surface format.

Yay, bring on the glorious new API that covers everyone's use-cases!  <https://xkcd.com/927/>

More seriously, a consistent way to get this right everywhere (insofar as it is even possible) would indeed be great if it actually achieves that.  (As much as Intel mostly works with VAAPI, it does still have weird holes and copy cases which shouldn't be there.)

Thanks,

- Mark


More information about the mesa-dev mailing list