[RFC] drm/amdgpu: Add macros and documentation for format modifiers.

Tue Sep 4 17:48:09 UTC 2018

Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> <bas at basnieuwenhuizen.nl> wrote:
>> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>>> <bas at basnieuwenhuizen.nl> wrote:
>>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl> wrote:
>>>>>>>>>>> +/* The chip this is compatible with.
>>>>>>>>>>> + *
>>>>>>>>>>> + * If compression is disabled, use
>>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>>>>>>>>>>> + *
>>>>>>>>>>> + * With compression enabled please use the exact chip.
>>>>>>>>>>> + *
>>>>>>>>>>> + * TODO: Do some generations share DCC format?
>>>>>>>>>>> + */
>>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
>>>>>>>>>> details? When we had the entire discussion with nvidia folks they
>>>>>>>>>> eventually agreed that they don't need the massive pile with every
>>>>>>>>>> possible combination. Do you really plan to share all these different
>>>>>>>>>> things?
>>>>>>>>>>
>>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
>>>>>>>>>> using explicit modifier flags for everything.
>>>>>>>>> Right. The conclusion, after people went through and started sorting
>>>>>>>>> out the kinds of formats for which they would _actually_ export real
>>>>>>>>> colour buffers for, that most vendors definitely have fewer than
>>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>>>>> possible formats to represent, very likely fewer than
>>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>>>>>>>>> bits of the 56 available.
>>>>>>>> The problem here is that at least for some parameters we actually don't know
>>>>>>>> which formats are actually used.
>>>>>>>>
>>>>>>>> The following are not real world examples, but just to explain the general
>>>>>>>> problem.
>>>>>>>>
>>>>>>>> The memory configuration for example can be not ASIC specific, but rather
>>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
>>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
>>>>>>>> channel, but it is still perfectly possible.
>>>>>>>>
>>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
>>>>>>>> could be bad and we need to know which to actually use.
>>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
>>>>>>> All the information you only need locally to a given amdgpu instance
>>>>>>> don't really need to be encoded in modifiers.
>>>>>>>
>>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
>>>>>>> good starters I guess) would be really good here.
>>>>>> I extracted the information on which bits are relevant mostly from the
>>>>>> AddrFromCoord functions in addrlib in mesa:
>>>>>>
>>>>>> for macro-tiles:
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>>>>>>
>>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
>>>>>>
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>>>> So this is the decoding thing. How many of these actually exist, even when
>>>>> taking all the other information into account?
>>>>>
>>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>>>>> + height + width, how much of all these bits do you actually still freely
>>>>> pick?
>>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>>>> depth, display-rotated) + whether to use compression, everything else
>>>> is fixed given those option, the properties of the chip and the
>>>> format.
>>>>
>>>>
>>>>> It might be that all the things you need to know from the memory config
>>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>>>>> kinda the angle that we looked at this for everyone else.
>>>>>
>>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
>>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
>>>>> becomes part of the implied stuff in the modifier.
>>>> The problem is some GPUs are compatible for say 8-bpp images, but not
>>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
>>>> current configuration for all GFX6-GFX8 GPU:
>>>>
>>>> format: (bank width, bank height, macro tile aspect, num banks) for
>>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>>>>
>>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
>>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
>>>> ADDR_SURF_P8_32x32_8x16
>>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
>>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
>>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
>>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
>>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
>>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
>>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>>>>
>>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
>>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
>>>> compatible for 8-bpp.
>>>>
>>>> So we can't just assume that if the first plane properties match that
>>>> they do so for the second plane, because we don't know what GPU it is
>>>> coming from.
>>>>
>>>> And we can make a canonical table from this but then if we change the
>>>> above tables in the kernel that runs into compatibility issues.
>>>
>>> I think that's roughly what nvidia ended up doing (not sure they
>>> published anything, haven't seen any patches on dri-devel). Two parts:
>>> - Bunch of flags/bits for the major mode, like {display, non-display,
>>> DCC} and so on.
>>> - For each of the major modes a list of enumerated modes actually used
>>> in reality. For DCC this would be the generation, for more plain
>>> formats it would be the above table. Table entries would be per-bpp
>>> (in case that matters, or whatever matters for amd gpus).
>>>
>>> Imo would be totally ok to put the relevant lookup tables into
>>> drm_fourcc.h directly. Or any other suitable place.
>> Right, the issue would be that the tables are configured int he kernel
>>
>> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
>>
>> and having a shared hardcoded table means that if the kernel tables
>> change we break userspace. That would be a new commitment on the side
>> of the kernel driver.
>>
>> (If I can get an okay for that commitment from Alex/Christian I'd be
>> totally happy to switch it over)
> Yeah it'd make that table uapi

That won't work. Those lists are not necessary stable.

What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index 
plus a 8bit version number.

Shouldn't be more than 32 bits in total.

Regards,
Christian.

>   - well the compressed from with
> duplicates removed, if you really want to make it tiny. But you can
> just add new entries if the heuristics for a given platform changes,
> as long as you keep the old entries working (if you shipped them ever
> ofc, otherwise totally ok to change as you see fit).
> -Daniel
>
>>>  From a quick look about 16 bits total. Gives you 40bits of second
>>> chances and "oops totally sorry".
>> I purposefully left 8 bits as a version field for those oopses.
>>
>>> -Daniel
>>>
>>>>> -Daniel
>>>>>
>>>>>>> -Daniel
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> If you do use 256 bits in order to represent
>>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>>>>>>>>> with some kind of eBPF script.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Daniel
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>> --
>>>>>>> Daniel Vetter
>>>>>>> Software Engineer, Intel Corporation
>>>>>>> http://blog.ffwll.ch
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> http://blog.ffwll.ch
>>>
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
>