[Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

Fri Jul 28 14:44:39 UTC 2017

Hi Daniel,

On 28.07.2017 12:46, Daniel Stone wrote:
> On 28 July 2017 at 10:24, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 28.07.2017 09:44, Daniel Stone wrote:
>>> No, I don't think it is. Tiled layouts still have a stride: if you
>>> look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an
>>> auxiliary compression/fast-clear buffer), iMX/etnaviv
>>> tiled/supertiled, or VC4 T-tiled modifiers and how they're handled
>>> both for DRIImage and KMS interchange, they all specify a stride which
>>> is conceptually the same as linear, if you imagine linear to be 1x1
>>> tiled.
>>>
>>> Most tiling users accept any integer units of tiles (buffer width
>>> aligned to tile width), but not always. The NV12MT format used by
>>> Samsung media decoders (also shipped in some Qualcomm SoCs) is
>>> particularly special, IIRC requiring height to be aligned to a
>>> multiple of two tiles.
>>
>> Fair enough, but I think you need to distinguish between the chosen stride
>> and stride *requirements*. I do think it makes sense to consider the stride
>> requirement as part of the format/layout description, but more below.
> 
> Right. Stride is a property of one buffer, stride requirements are a
> property of the users of that buffer (GPU, display control, media
> encode, etc). The requirements also depend on use, e.g. trying to do
> rotation through your scanout engine can change those requirements.

Right.

>>> It definitely seems attractive to kill two birds with one stone, but
>>> I'd really much rather not conflate format description/advertisement,
>>> and allocation restriction, into one enum. I'm still on the side of
>>> saying that this is a problem modifiers do not solve, deferring to the
>>> allocator we need anyway in order to determine things like placement
>>> and global optimality (e.g. rotated scanout placing further
>>> restrictions on allocation).
>>
>> Okay, the original issue here is that the allocator *cannot* determine the
>> alignment requirement in the use case that prompted this sub-thread.
>>
>> The use case is PRIME off-loading, where the rendering GPU supports linear
>> layouts with a 64 byte stride, while the display GPU requires a 256 byte
>> stride.
>>
>> The allocator *cannot* solve this issue, because the allocation happens on
>> the rendering GPU. We need to communicate somehow what the display GPU's
>> stride requirements are.
>>
>> How do you propose to do that?
> 
> The allocator[0] in itself can't magically reach across processes to
> determine desired usage and resolve dependencies. But the entire
> design behind it was to be able to solve cross-device usage: between
> GPU and scanout, between both of those and media encode/decode, etc.
> Obviously it can't do that without help, so winsys will need to gain
> protocol in order to express those in terms the allocator will
> understand.
> 
> The idea was to split information into positive capabilities and
> negative constraints. Modifier queries fall into the same boat as
> format queries: you're expressing an additional capability ('I can
> speak tiled'). Stride alignment, for me, falls into a negative
> constraint ('linear allocations must have stride aligned to 256
> bytes'). Similarly, placement constraints (VRAM possibly only
> accessible to SLI-type paired GPU vs. GTT vs. pure system RAM, etc)
> are in the same boat AFAICT. So this helps solve one side of the
> equation, but not the other.

I've been thinking about this some more, and I can see now that the 
changed modifier scheme that I originally proposed does not fit well 
into places where modifiers are used to express buffer properties (e.g. 
DRI3PixmapFromBuffers, DRI3BuffersFromPixmap).

But I see no proposal on how to fix the issue so far. You cannot fully 
separate capabilities from constraints. As is, we (AMD) cannot properly 
implement the proposed DRI3 v1.1: what would we return in 
DRI3GetSupportedModifiers?

The natural option is to return (at least) DRM_FORMAT_MOD_LINEAR, but 
that would be a lie, because we *don't* speak arbitrary linear formats.

I don't think this is difficult to fix in terms of protocol, although 
there's plenty of opportunity for bike-shedding :)

I see roughly two options:

1. Make the constraints per-modifier, and add a "constraints: 
ListOfCard32" (or 64) to the response to DRI3GetSupportedModifiers. We 
can then reserve some bits for global constraints (e.g. placement) and 
some bits on a per-modifier basis (e.g. stride alignment for linear). 
You could build constraints like DRM_CONSTRAINT_PLACEMENT_SYSTEM | 
DRM_CONSTRAINT_LINEAR_STRIDE_256B.

2. Make the constraints global, and add a DRI3GetConstraints protocol 
with the same signature as DRI3GetSupportedModifiers. We'd need vendor 
namespaces for the constraint defines, to support constraints that are 
specific to vendor-specific modifiers. You could have entries like 
DRM_CONSTRAINT_PLACEMENT(DRM_CONSTRAINT_PLACEMENT_SYSTEM) and, as a 
separate list entry, DRM_CONSTRAINT_LINEAR_STRIDE(256).

Cheers,
Nicolai