[RFC] drm/amdgpu: Add macros and documentation for format modifiers.

Tue Sep 4 13:33:57 UTC 2018

On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>
> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel at ffwll.ch> wrote:
> > >
> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > > > Hi,
> > > > >
> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl> wrote:
> > > > > > > +/* The chip this is compatible with.
> > > > > > > + *
> > > > > > > + * If compression is disabled, use
> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > > > > + *
> > > > > > > + * With compression enabled please use the exact chip.
> > > > > > > + *
> > > > > > > + * TODO: Do some generations share DCC format?
> > > > > > > + */
> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling
> > > > > > details? When we had the entire discussion with nvidia folks they
> > > > > > eventually agreed that they don't need the massive pile with every
> > > > > > possible combination. Do you really plan to share all these different
> > > > > > things?
> > > > > >
> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > > > > buffer size and buffer format (because that's how the hw works), not
> > > > > > using explicit modifier flags for everything.
> > > > > Right. The conclusion, after people went through and started sorting
> > > > > out the kinds of formats for which they would _actually_ export real
> > > > > colour buffers for, that most vendors definitely have fewer than
> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > > possible formats to represent, very likely fewer than
> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > > > bits of the 56 available.
> > > >
> > > > The problem here is that at least for some parameters we actually don't know
> > > > which formats are actually used.
> > > >
> > > > The following are not real world examples, but just to explain the general
> > > > problem.
> > > >
> > > > The memory configuration for example can be not ASIC specific, but rather
> > > > determined by whoever took the ASIC and glued it together with VRAM on a
> > > > board. It is not likely that somebody puts all the VRAM chips on one
> > > > channel, but it is still perfectly possible.
> > > >
> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them
> > > > could be bad and we need to know which to actually use.
> > >
> > > For my understanding: This leaks outside the chip when sharing buffers?
> > > All the information you only need locally to a given amdgpu instance
> > > don't really need to be encoded in modifiers.
> > >
> > > Pointers to code where this is all decided (kernel and radeonsi would be
> > > good starters I guess) would be really good here.
> >
> > I extracted the information on which bits are relevant mostly from the
> > AddrFromCoord functions in addrlib in mesa:
> >
> > for macro-tiles:
> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >
> > for micro-tiles (or the micro-tiles in macro-tiles):
> >
> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>
> So this is the decoding thing. How many of these actually exist, even when
> taking all the other information into account?
>
> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> + height + width, how much of all these bits do you actually still freely
> pick?

Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
depth, display-rotated) + whether to use compression, everything else
is fixed given those option, the properties of the chip and the
format.

>
> It might be that all the things you need to know from the memory config
> don't encode smaller than the macro/micro/whatever else stuff. But that's
> kinda the angle that we looked at this for everyone else.
>
> E.g. for multi-plane stuff, if everyone picks the same config for the
> 2nd/3rd plane, then you don't actually need to encode that. It just
> becomes part of the implied stuff in the modifier.

The problem is some GPUs are compatible for say 8-bpp images, but not
for 32-bpp surfaces. e.g. lets look at the following table showing the
current configuration for all GFX6-GFX8 GPU:

format: (bank width, bank height, macro tile aspect, num banks) for
8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG

verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
ADDR_SURF_P8_32x32_8x16
bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2

We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
compatible for 8-bpp.

So we can't just assume that if the first plane properties match that
they do so for the second plane, because we don't know what GPU it is
coming from.

And we can make a canonical table from this but then if we change the
above tables in the kernel that runs into compatibility issues.

> -Daniel
>
> >
> > > -Daniel
> > >
> > > >
> > > > Regards,
> > > > Christian.
> > > >
> > > > >
> > > > > If you do use 256 bits in order to represent
> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > > modifiers per format, userspace would start hitting OOM pretty quickly
> > > > > as it attempted to enumerate and negotiate acceptable modifiers.
> > > > > Either that or we need to replace the fixed 64-bit modifier tokens
> > > > > with some kind of eBPF script.
> > > > >
> > > > > Cheers,
> > > > > Daniel
> > > > > _______________________________________________
> > > > > dri-devel mailing list
> > > > > dri-devel at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch