[RFC] drm/amdgpu: Add macros and documentation for format modifiers.

Tue Sep 4 18:27:43 UTC 2018

On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
<bas at basnieuwenhuizen.nl> wrote:
> On Tue, Sep 4, 2018 at 7:48 PM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>>
>> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
>> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
>> > <bas at basnieuwenhuizen.nl> wrote:
>> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>> >>> <bas at basnieuwenhuizen.nl> wrote:
>> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> >>>>>>>>> Hi,
>> >>>>>>>>>
>> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl> wrote:
>> >>>>>>>>>>> +/* The chip this is compatible with.
>> >>>>>>>>>>> + *
>> >>>>>>>>>>> + * If compression is disabled, use
>> >>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> >>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> >>>>>>>>>>> + *
>> >>>>>>>>>>> + * With compression enabled please use the exact chip.
>> >>>>>>>>>>> + *
>> >>>>>>>>>>> + * TODO: Do some generations share DCC format?
>> >>>>>>>>>>> + */
>> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
>> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they
>> >>>>>>>>>> eventually agreed that they don't need the massive pile with every
>> >>>>>>>>>> possible combination. Do you really plan to share all these different
>> >>>>>>>>>> things?
>> >>>>>>>>>>
>> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
>> >>>>>>>>>> using explicit modifier flags for everything.
>> >>>>>>>>> Right. The conclusion, after people went through and started sorting
>> >>>>>>>>> out the kinds of formats for which they would _actually_ export real
>> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than
>> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >>>>>>>>> possible formats to represent, very likely fewer than
>> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>> >>>>>>>>> bits of the 56 available.
>> >>>>>>>> The problem here is that at least for some parameters we actually don't know
>> >>>>>>>> which formats are actually used.
>> >>>>>>>>
>> >>>>>>>> The following are not real world examples, but just to explain the general
>> >>>>>>>> problem.
>> >>>>>>>>
>> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather
>> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
>> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
>> >>>>>>>> channel, but it is still perfectly possible.
>> >>>>>>>>
>> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
>> >>>>>>>> could be bad and we need to know which to actually use.
>> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
>> >>>>>>> All the information you only need locally to a given amdgpu instance
>> >>>>>>> don't really need to be encoded in modifiers.
>> >>>>>>>
>> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
>> >>>>>>> good starters I guess) would be really good here.
>> >>>>>> I extracted the information on which bits are relevant mostly from the
>> >>>>>> AddrFromCoord functions in addrlib in mesa:
>> >>>>>>
>> >>>>>> for macro-tiles:
>> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >>>>>>
>> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
>> >>>>>>
>> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>> >>>>> So this is the decoding thing. How many of these actually exist, even when
>> >>>>> taking all the other information into account?
>> >>>>>
>> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> >>>>> + height + width, how much of all these bits do you actually still freely
>> >>>>> pick?
>> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>> >>>> depth, display-rotated) + whether to use compression, everything else
>> >>>> is fixed given those option, the properties of the chip and the
>> >>>> format.
>> >>>>
>> >>>>
>> >>>>> It might be that all the things you need to know from the memory config
>> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> >>>>> kinda the angle that we looked at this for everyone else.
>> >>>>>
>> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
>> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
>> >>>>> becomes part of the implied stuff in the modifier.
>> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not
>> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
>> >>>> current configuration for all GFX6-GFX8 GPU:
>> >>>>
>> >>>> format: (bank width, bank height, macro tile aspect, num banks) for
>> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>> >>>>
>> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
>> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
>> >>>> ADDR_SURF_P8_32x32_8x16
>> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
>> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
>> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
>> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
>> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
>> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
>> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>> >>>>
>> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
>> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
>> >>>> compatible for 8-bpp.
>> >>>>
>> >>>> So we can't just assume that if the first plane properties match that
>> >>>> they do so for the second plane, because we don't know what GPU it is
>> >>>> coming from.
>> >>>>
>> >>>> And we can make a canonical table from this but then if we change the
>> >>>> above tables in the kernel that runs into compatibility issues.
>> >>>
>> >>> I think that's roughly what nvidia ended up doing (not sure they
>> >>> published anything, haven't seen any patches on dri-devel). Two parts:
>> >>> - Bunch of flags/bits for the major mode, like {display, non-display,
>> >>> DCC} and so on.
>> >>> - For each of the major modes a list of enumerated modes actually used
>> >>> in reality. For DCC this would be the generation, for more plain
>> >>> formats it would be the above table. Table entries would be per-bpp
>> >>> (in case that matters, or whatever matters for amd gpus).
>> >>>
>> >>> Imo would be totally ok to put the relevant lookup tables into
>> >>> drm_fourcc.h directly. Or any other suitable place.
>> >> Right, the issue would be that the tables are configured int he kernel
>> >>
>> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
>> >>
>> >> and having a shared hardcoded table means that if the kernel tables
>> >> change we break userspace. That would be a new commitment on the side
>> >> of the kernel driver.
>> >>
>> >> (If I can get an okay for that commitment from Alex/Christian I'd be
>> >> totally happy to switch it over)
>> > Yeah it'd make that table uapi
>>
>> That won't work. Those lists are not necessary stable.
>>
>> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
>> plus a 8bit version number.
>
> But if we use PCI-ID, then we cannot really share between different
> models of GPU ...
>
> So we have seen 3 options on design level:
> 1) Use PCI-ID/chip.
>   Advantage:
>     -simple
>     - Few bits
>   Disadvantage:
>     - No sharing between different GPU models
> 2) Use a fixed lookup/canonicalization table.
>   Advantage:
>     -Few bits
>     - Sharing between models
>   Disadvantage:
>     - Cannot change the kernel tiling tables.

Why is changing the kernel table impossible? Just because it's uapi
doesn't mean we can never extend it. Just add a new entry at the
bottom if you need more combinations ...

You don't even need to require the updated kernel amdgpu.ko, just the
updated drm_fourcc.h header (if it's one of these things you can
configure through cs packets entirely from userspace).

Or why does this not work?
-Daniel

> 3) Encode tiling mode properties in modifier.
>   Advantage:
>     - Can share between models.
>     - Can change the kernel tiling tables.
>   Disadvantage:
>     - Use a lot of bits.
>
> Overall the number of exposed modifiers should be pretty much the
> same, so encoding size that we have to deal with all possible bit
> patterns.
>
> I've implemented option 3 in the patch here because I think it is the
> best tradeoff. As far as I can see it fits, and especially if we do no
> multisample support and remove the tile split bytes, and remove the
> 3rd set of plane macro-tile options because no format has 3 different
> bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
> for future extensions and a version number. It is not great but pretty
> doable?
>
>>
>> Shouldn't be more than 32 bits in total.
>>
>> Regards,
>> Christian.
>>
>> >   - well the compressed from with
>> > duplicates removed, if you really want to make it tiny. But you can
>> > just add new entries if the heuristics for a given platform changes,
>> > as long as you keep the old entries working (if you shipped them ever
>> > ofc, otherwise totally ok to change as you see fit).
>> > -Daniel
>> >
>> >>>  From a quick look about 16 bits total. Gives you 40bits of second
>> >>> chances and "oops totally sorry".
>> >> I purposefully left 8 bits as a version field for those oopses.
>> >>
>> >>> -Daniel
>> >>>
>> >>>>> -Daniel
>> >>>>>
>> >>>>>>> -Daniel
>> >>>>>>>
>> >>>>>>>> Regards,
>> >>>>>>>> Christian.
>> >>>>>>>>
>> >>>>>>>>> If you do use 256 bits in order to represent
>> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>> >>>>>>>>> with some kind of eBPF script.
>> >>>>>>>>>
>> >>>>>>>>> Cheers,
>> >>>>>>>>> Daniel
>> >>>>>>>>> _______________________________________________
>> >>>>>>>>> dri-devel mailing list
>> >>>>>>>>> dri-devel at lists.freedesktop.org
>> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >>>>>>> --
>> >>>>>>> Daniel Vetter
>> >>>>>>> Software Engineer, Intel Corporation
>> >>>>>>> http://blog.ffwll.ch
>> >>>>>>> _______________________________________________
>> >>>>>>> dri-devel mailing list
>> >>>>>>> dri-devel at lists.freedesktop.org
>> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >>>>> --
>> >>>>> Daniel Vetter
>> >>>>> Software Engineer, Intel Corporation
>> >>>>> http://blog.ffwll.ch
>> >>>
>> >>>
>> >>> --
>> >>> Daniel Vetter
>> >>> Software Engineer, Intel Corporation
>> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>> >
>> >
>>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch