[PATCH v3] drm/fourcc: document modifier uniqueness requirements
Daniel Stone
daniel at fooishbar.org
Fri May 29 15:01:48 UTC 2020
On Fri, 29 May 2020 at 15:36, Alex Deucher <alexdeucher at gmail.com> wrote:
> On Fri, May 29, 2020 at 10:32 AM Daniel Stone <daniel at fooishbar.org> wrote:
> > On Fri, 29 May 2020 at 15:29, Alex Deucher <alexdeucher at gmail.com> wrote:
> > > Maybe I'm over thinking this. I just don't want to get into a
> > > situation where we go through a lot of effort to add modifier support
> > > and then performance ends up being worse than it is today in a lot of
> > > cases.
> >
> > I'm genuinely curious: what do you imagine could cause a worse result?
>
> As an example, in some cases, it's actually better to use linear for
> system memory because it better aligns with pcie access patterns than
> some tiling formats (which are better aligned for the memory
> controller topology on the dGPU). That said, I haven't been in the
> loop as much with the tiling formats on newer GPUs, so that may not be
> as much of an issue anymore.
Yeah, that makes a lot of sense. On the other hand, placement isn't
explicitly encoded for either modifiers or non-modifiers, so I'm not
sure how it would really regress.
In case it was missed somewhere, there is no generic code doing
modifier selection for modifier optimality anywhere. The flow is:
- every producer/consumer advertises a list of modifier + format
pairs, declaring what they _can_ support
- for every use where a buffer needs to be allocated, the generic
code intersects these lists of modifiers to determine the set of
modifiers mutually acceptable to all consumers
- the buffer allocator is always handed a _list_ of modifiers, and
makes its own decision based on ??
For a concrete end-to-end example:
- KMS declares which modifiers are supported for scanout
- EGL declares which modifiers are supported for EGLImage import
- Weston determines that one of its clients could be directly
scanned out rather than composited
- Weston intersects the KMS + EGL set of modifiers to come up with
the optimal modifier set (i.e. bypassing composition)
- Weston sends this intersected list to the client via the Wayland
protocol (mentioned in previous MR)
- the client is using EGL, so Mesa receives this list of modifiers,
and passes this on to amdgpu
- amdgpu uses magic inscrutable heuristics to determine the most
optimal modifier to use, and allocates a buffer based on that
Weston (or GNOME Shell, or Chromium, or whatever) will never be in a
position as a generic client to know that on Raven2 it should use a
particular supertiled layout with no DCC if width > 2048. So we
designed the entire framework to explicitly avoid generic code trying
to reason about the performance properties of specific modifiers.
What Weston _does_ know, however, is that display controller can work
with modifier set A, and the GPU can work with modifier set B, and if
the client can pick something from modifier set A, then there is a
much greater probability that Weston can leave the GPU alone so it can
be entirely used by the client. It also knows that if the surface
can't be directly scanned out for whatever reason, then there's no
point in the client optimising for direct scanout, and it can tell the
client to select based on optimality purely for the GPU.
So that's the thinking behind the interface: that the driver still has
exactly as much control and ability to use magic heuristics as it
always has, but that system components can supplement the driver's
heuristics with their own knowledge, to increase the chance that the
driver's heuristics arrive at a configuration that a) will definitely
work, and b) have a much greater chance of working optimally.
Does that help at all?
Cheers,
Daniel
More information about the dri-devel
mailing list