[PATCH v3] drm/fourcc: document modifier uniqueness requirements

Wed Jun 3 09:48:20 UTC 2020

Hi Alex,

On Mon, 1 Jun 2020 at 15:25, Alex Deucher <alexdeucher at gmail.com> wrote:
> On Fri, May 29, 2020 at 11:03 AM Daniel Stone <daniel at fooishbar.org> wrote:
> > What Weston _does_ know, however, is that display controller can work
> > with modifier set A, and the GPU can work with modifier set B, and if
> > the client can pick something from modifier set A, then there is a
> > much greater probability that Weston can leave the GPU alone so it can
> > be entirely used by the client. It also knows that if the surface
> > can't be directly scanned out for whatever reason, then there's no
> > point in the client optimising for direct scanout, and it can tell the
> > client to select based on optimality purely for the GPU.
>
> Just so I understand this correctly, the main reason for this is to
> deal with display hardware and render hardware from different vendors
> which may or may not support any common formats other than linear.

It handles pretty much everything other than a single-context,
single-GPU, single-device, tunnel.

When sharing between subsystems and device categories, it lets us talk
about capabilities in a more global way. For example, GBM lets you
talk about 'scanout' and 'texture' and 'render', but what about media
codecs? We could add the concept of decode/encode to something like
GBM, and all the protocols like Wayland/X11 as well, then hope it
actually works, but ...

When sharing between heterogeneous vendors, it lets us talk about
capabilities in a neutral way. For example, if you look at most modern
Arm SoCs, your GPU, display controller, and media codec, will very
likely all be from three totally different vendors. A GPU like
Mali-T8xx can be shipped in tens of different vendor SoCs in several
different revisions each. Just saying 'scanout' is totally meaningless
for the Panfrost driver. Putting awareness for every different KMS
platform and every different codec down into the Mesa driver is a
synchronisation nightmare, and all those drivers would also need
specific awareness about the Mesa driver. So modifiers allow us to
explicitly describe that we want a particular revision of Arm
Framebuffer Compression, and all the components can understand that
without having to be specifically aware of 15 different KMS drivers.
But even if you have the same vendor ...

When sharing between multiple devices of the same class from the same
vendor, it lets us surface and transit that information in a generic
way, without AMD having to figure out ways to tunnel back-channel
information between different instances of drivers potentially
targeting different revisions. The alternatives seem to be deeply
pessimal hacks, and we think we can do better. And when we get
pessimal ...

In every case, modifiers are about surfacing and sharing information.
One of the reasons Collabora have been putting so much time and energy
into this work is exactly _because_ solving those problems on a
case-by-case basis was a pretty lucrative source of revenue for us.
Debugging these kinds of issues before has usually involved specific
driver knowledge, hacking into the driver to insert your own tracing,
etc.

If you (as someone who's trying to use a device optimally) are
fortunate enough that you can get the attention of a vendor and have
them solve the problem for you, then that's lucky for everyone apart
from the AMD engineers who have to go solve it. If you're not, and you
can't figure it out yourself, then you have to go pay a consultancy.
On the face of it, that's good for us, except that we don't want to be
doing that kind of repetitive boring work. But it's bad for the
ecosystem that this knowledge is hidden away and that you have to pay
specialists to extract it. So we're really keen to surface as much
mechanism and information as possible, to give people the tools to
either solve their own problems or at least make well-informed
reports, burn down a toxic source of revenue, waste less engineering
time extracting hidden information, and empower users as much as
possible.

> It
> provides a way to tunnel device capabilities between the different
> drivers.  In the case of a device with display and rendering on the
> same device or multiple devices from the same vendor, it not really
> that useful.

Oh no, it's still super useful. There are a ton of corner cases where
'if you're on same same-vendor same-gen same-silicon hardware' falls
apart - in addition to the world just not being very much
same-vendor/same-gen/same-silicon anymore. For some concrete examples:

On NVIDIA Tegra hardware, planes within the display controller have
heterogeneous capability. Some can decompress and detile, others
can't.

On Rockchip hardware, AFBC (DCC equivalent) is available for scanout
on any plane, and can be produced by the GPU. Great! Except that 'any
plane' isn't 'every plane' - there's a global decompression unit.

On Intel hardware, they appear to have forked the media codec IP,
shipping two different versions of the codec, one as 'low-power' and
one as 'normal', obviously with varying capability.

Even handwaving those away as vendor errors - that performance on
those gens will always be pessimal and they should do better next time
- I don't think same-vendor/same-gen/same-silicon is a good design
point anymore. Between heterogeneous cut-and-paste SoCs, multi-GPU and
eGPU usecases, virtualisation and tunneling, etc, the usecases are
starting to demand that we do better. Vulkan's memory-allocation
design also really pushes against the model that memory allocations
themselves are blessed with side-channel descriptor tags.

'Those aren't my usecases and we've made Vulkan work so we don't need
it' is an entirely reasonable position, but then you're just
exchanging the problem of describing your tiling & compression layouts
in a 56-bit enum to make modifiers work, for the problem of
maintaining a surprisingly wide chunk of the display stack. For all
the reasons above, over the past few years, the entire rest of the
ecosystem has settled on using modifiers to describe and negotiate
buffer exchange across context/process/protocol/subsystem/device
boundaries. All the effort of making this work in KMS, GBM, EGL,
Vulkan, Wayland, X11, V4L2, VA-API, GStreamer, etc, is going there.

Realistically, the non-modifier path is probably going to bitrot, and
people are certainly resistant to putting more smarts into it, because
it just adds complexity to a now-single-vendor path - even NVIDIA are
pushing this forward, and their display path is much more of an
encapsulated magic tunnel than AMD's. In that sense, it's pretty much
accumulating technical debt; the longer you avoid dealing with the
display stack by implementing modifiers, the more work you have to put
into maintaining the display stack by fixing the non-modifier path.

> It doesn't seem to provide much over the current EGL
> hints (SCANOUT, SECURE, etc.).

Well yeah, if those single bits of information are enough to perfectly
encapsulate everything you need to know, then sure. But it hasn't been
for others, which is why we've all migrated away from them.

> I still don't understand how it solves
> the DCC problem though.  Compression and encryption seem kind like
> meta modifiers.  There is an under laying high level layout, linear,
> tiled, etc. but it could also be compressed and/or encrypted.  Is the
> idea that those are separate modifiers?  E.g.,
> 0: linear
> 1: linear | encrypted
> 2. linear | compressed
> 3: linear | encrypted | compressed
> 4: tiled1
> 5: tiled1 | encrypted
> 6: tiled1 | compressed
> 7: tiled1 | encrypted | compressed
> etc.
> Or that the modifiers only expose the high level layout, and it's then
> up the the driver(s) to enable compression, etc. if both sides have a
> compatible layout?

Do you remember the old wfb from xserver? Think of modifiers as pretty
much that. The format (e.g. A8R8G8B8) describes what you will read
when you load a particular pixel/texel, and what will get stored when
you write. The modifier describes how to get there: that includes both
tiling (since you need to know the particular tiling layout in order
to know the byte location to access), and compression (since you need
to know the particular compression mechanism in order to access the
pixel, e.g. for RLE-type compression that you need to access the first
pixel of the tile if the 'all pixels are the identical' bit is set).

The idea is that these tokens fully describe the mechanisms in use,
without the drivers needing to do magic heuristics. For instance, if
your modifier is just 'tiled', then that's not a full description. A
full description would tell you about supertiling structures, tile
sizes and ordering, etc. The definitions already in
include/uapi/drm/drm_fourcc.h are a bit of a mixed bag - we've
definitely learnt more as we've gone on - but the NVIDIA definitions
are  pretty exemplary for something deeply parameterised along a lot
of variable axes.

Basically, if you have to have sets of heuristics which you keep in
sync in order to translate from modifier -> hardware layout params,
then your modifiers aren't expressive enough. From a very quick look
at DC, that would be your tile-split, tile-mode, array-mode, and
swizzle-mode parameters, plus whatever from dc_tiling_mode isn't
completely static and deterministic. 'DCCRate' always appears to be
hardcoded to 1 (and 'DCCRateChroma' never set), but that might be one
to parameterise as well.

With that expression, you don't have to determine the tiling layout
from dimensions/usage/etc, because the modifier _is_ the tiling
layout, ditto compression.

Encryption I'm minded to consider as something different. Modifiers
don't cover buffer placement at all. That includes whether or not the
memory is physically contiguous, whether it's in
hidden-VRAM/BAR/sysmem, which device it lives on, etc. As far as I can
tell from TMZ, encryption is essentially a side effect of placement?
The memory is encrypted, the encryption is an immutable property of
the allocation, and if the device is configured to access encrypted
memory (by being 'secure'), then the encryption is transparent, no?

That being said, there is a reasonable argument to consume a single
bit in modifiers for TMZ on/off (assuming TMZ is not parameterised),
which would make its availability and use much more transparent.

Cheers,
Daniel