[Mesa-dev] 10bit HEVC decoding for RadeonSI

Christian König deathsimple at vodafone.de
Fri Jan 27 14:27:55 UTC 2017


Am 27.01.2017 um 13:51 schrieb Mark Thompson:
> On 26/01/17 16:59, Christian König wrote:
>> Am 26.01.2017 um 13:14 schrieb Mark Thompson:
>>> [SNIP]
>> The problem here is I need to know what will be done with the surface from the very beginning. E.g. if you want to scan it out directly to hardware you need a different memory layout as when you just want to sample from it.
>>
>> Same applies to sampling from it from OpenGL and in this case also how you want to sample from it etc etc...
> For use in other places (like scanout or OpenGL), is this a correctness issue (things just won't work with the wrong layout) or a performance one (things will be slower or use more resources)?

It is a correctness issue, cause only a subset of the formats are 
directly usable with certain sampling modes or scanout for example.

> (For that matter, is there a list somewhere of the set of formats/layouts and what they are used for?)

Well taking a look at all the use cases and options you can program into 
the decoder you easily end up with 100+ format combinations.

For example the decoder can split a frame into the top and bottom field 
during decode, apply different tiling modes and layout the YUV data 
either packet or in separate planes.

>>> To my mind, the PixelFormat attribute (fourcc) is only specifying the appearance of the format from the point of view of the user.  That is, what you will get if you call vaDeriveImage() and then map the result.
>> And exactly that is completely nonsense. You CAN'T assume that the decoding result is immediately accessible by the CPU.
>>
>> So the only proper way of handling this is going the VDPAU design. You create the surface without specifying any format, decode into it with the decoder and then the application tells the driver what format it wants to access it.
>>
>> The driver then copies the data to CPU accessible memory and does the conversion to the format desired by the application on the fly.
>>
>>>     Tiling then doesn't cause any problems because it is a property of the DRM object and mapping can automagically take it into account.
>> No it can't, tiled surfaces are not meant to be CPU accessible. So the whole idea of mapping a surface doesn't make much sense to me.
> If they aren't CPU accessible, doesn't this mean that the layout of the surfaces isn't observable by the user and therefore doesn't matter to the API?

Exactly, but the problem is that VA-API suggests otherwise.

E.g. we had it multiple times that customers coded a specialized 
application, tested them on Intel hardware and then wanted to have it 
working on AMD in exactly the same way.

That unfortunately didn't worked because our internal memory layout is 
just completely different.

> Since the user can't access the surface directly, it can be whatever is most suitable for the hardware and the user can't tell.

That won't work. An example we ran into was that a customer wanted to 
black out the first and last line of an image for BOB deinterlacing.

To do so he mapped the surface and just used memset() on the appropriate 
addresses. On Intel I was told this works because the mapping seems to 
be bidirectional and all changes done to it are reflected in the 
original video surface/image.

> The API certainly admits the possibility that vaDeriveImage() just can't expose surfaces to the CPU directly, or that there are extra implicit copies so that it all appears consistent from the point of view of the user.

Yeah, but it doesn't enforce that. E.g. you don't have it properly 
defined that the derived Image is a copy and it looks like that on Intel 
it just works so people tend to use it.

> I think my use of the word "mapping" wasn't helping there: I was using it to refer both to mapping to the CPU address space (which need not be supported) and to other APIs (OpenGL, OpenCL, whatever) which will use it on the GPU (which is far more important).  My real question on the tiling issue was: is tiling/layout/whatever a properly of the DRM object, such that other APIs interacting with it can do the right thing without the user needing to know about it?

No, they can't. For example when we import the UV plane of an NV12 
surface into OpenGL then OpenGL needs to know that this originally comes 
from an decoded image. Otherwise it won't use the correct tilling paramters.

Currently we make an educated guess based on the offset what is about to 
be imported in the GL driver, but that isn't really good engineering.

To be fair I've ran into the exactly same problem with VDPAU as well, 
that's the reason we currently have tilling disabled for video surfaces 
there as well.

> If not, then the VAAPI buffer-sharing construction (vaCreateSurfaces() and vaAcquireBuffer() with VA_SURFACE_ATTRIB_TYPE_DRM_PRIME) doesn't contain enough information to ever work and we should be trying to fix it in the API.

Those problems won't resolve with just changes to VA-API. Essentially we 
need to follow NVidias proposal for the "Unix Device Memory Allocation 
project".

Hopefully this should in the end result in a library which based on the 
use case allow drivers to negotiate the surface format.

Regards,
Christian.

> If so, then these issues seem resolvable, albeit possibly with extra copies in places where we haven't been able to work out far enough in advance what was needed (or where the user really does want to do multiple things which require different formats: maybe drawing something on a surface (OSD/subtitles, say) and then sending it to an encoder or scanout).
>
> Thanks,
>
> - Mark
>



More information about the mesa-dev mailing list