[RFC PATCH 0/3] Introducing I915_FORMAT_MOD_4_TILED_XE2_CCS Modifier for Xe2

Tue May 14 16:51:09 UTC 2024

On Tue, 2024-05-14 at 12:25 +0300, Joonas Lahtinen wrote:
> Quoting Kenneth Graunke (2024-05-11 03:58:34)
> > On Tuesday, May 7, 2024 3:56:57 PM PDT Matt Roper wrote:
> > > On Mon, May 06, 2024 at 09:52:35PM +0300, Juha-Pekka Heikkila
> > > wrote:
> > > > These patches introduce I915_FORMAT_MOD_4_TILED_XE2_CCS
> > > > modifier, which,
> > > > from the kernel's perspective, behaves similarly to 
> > `I915_FORMAT_MOD_4_TILED`.
> > > > This new modifier is primarily intended for user space to
> > > > effectively 
> > monitor
> > > > compression status, especially when dealing with a mix of
> > > > compressed and
> > > > uncompressed buffers.
> > > > 
> > > > The addition of this modifier facilitates user space in
> > > > managing 
> > compression
> > > > status, particularly when utilizing both compressed and
> > > > uncompressed 
> > buffers
> > > > concurrently. To leverage compression for these buffers, user
> > > > space
> > > > applications must configure the appropriate Page Attribute
> > > > Table (PAT) 
> > index.
> > > > Display engine will treat all Tile4 as if it were compressed
> > > > under all
> > > > circumstances on Xe2 architecture.
> > > 
> > > I may have missed some discussion about this, but I thought the
> > > previous
> > > consensus was that we didn't want/need new modifiers for
> > > compression on
> > > Xe2?  If a userspace client (or the display hardware) receives a
> > > buffer
> > > of unknown origin and unknown compression status, it's always
> > > fine to
> > > select a compressed PAT when binding the buffer to read since
> > > even for
> > > uncompressed buffers the CCS metadata will accurately reflect the
> > > compression status.  Unlike Xe1, where generating content without
> > > compression enabled would leave random garbage in the FlatCCS
> > > area, Xe2
> > > will set the corresponding FlatCCS to '0x0' for each block,
> > > indicating
> > > uncompressed data.
> > > 
> > > Can you explain more what the benefit of handling these modifiers
> > > explicitly is?
> > > 
> > > 
> > > Matt
> > 
> > Thanks, Matt!  I'm a bit late in getting up to speed with the Xe2
> > compression 
> > changes; this is really good information.
> > 
> > As I understand it...all blocks on the GPU behave in the way you
> > mentioned, 
> > where generating uncompressed data via the GPU will set FlatCCS =
> > 0, so you 
> > can assume a compressed PAT entry and everything works.
> > 
> > One snag is...I've heard that CPU access doesn't work that way. 
> > So, if you 
> > mmap a buffer on the CPU, and write data with the CPU, then I think
> > we're back 
> > to the "FlatCCS contains uninitialized garbage" case, where it's
> > unsafe to 
> > assume a compressed PAT.  And... we don't really know when sharing
> > buffers 
> > whether the other side is going to want to do CPU access.
> 
> I think the previous discussion has specifically happened in the
> context of
> dma-buf, so not only CPU but other GPUs/accelerators/decoders/devices
> in the
> system are also relevant.
> 
> It boils down to the fact that when exporting a dma-buf, one can't
> know it will
> be consumed only by the same GPU (or other device for that matter)
> unless there
> is an explicit negotiation between exporter and importers.
> 
> > It would be really nice to assume compression by default, though,
> > which got me 
> > thinking: if we mmap a buffer via DRM_XE_GEM_MMAP_OFFSET, could
> > xe.ko disable 
> > compression for us?  So, resolve any outstanding CCS data, and then
> > switch any 
> > PAT entries to uncompressed.  Mapping would block until that
> > resolve is done.  
> > It could leave compression off forever (once you CPU map a buffer,
> > it's never 
> > compressed again).  Or it could turn CCS back on when map count
> > reaches 0 (but 
> > frankly I'm not sure that's terribly important, and sounds more
> > complex).
> 
> This would only really work for a single device but the dma-buf is
> specifically targeting more generic sharing. There's no built-in
> mechanism
> to limit the sharing to subset of devices without explicit
> negotiation
> between the exporter and importers.
> 
> So I think the "by default" mode needs to be interoperable, and the
> explicit negotiation can then use less compatible formats given those
> FD
> are never passed to importers that were not part of the negotiation.
> 
> > As I understand it, at least on discrete GPUs, the kernel already
> > has to do 
> > something similar for eviction, when migrating BOs to system memory
> > (which 
> > doesn't support compression).  So this would be doing basically the
> > same 
> > "resolve and disable CCS" step the kernel can presumably already
> > do, but now 
> > on mmap as well.
> 
> So far the eviction strategy has been to copy both the backing store
> and
> compression bits in raw form. With Xe2 it would indeed be possible to
> do
> an implicit resolve IFF the buffer has not been shared to someone who
> doesn't
> understand compression and might have left garbage in the CCS bits.
> 
> When evicting in raw form, kernel doesn't have to know if the CCS
> bits
> are garbage or not on any given moment.
> 
> Regards, Joonas

Just a follow-up comment (TBC), IF we're going for the everything-is-
compressed approach, I think there are some considerations to be made:

dma-buf exports to foreign devices need to resolve at map_attachment
time. Foreign devices are all devices that can't interpret the
compressed content.
dma-buf mmaps need to resolve. IIRC Could be implemented in the
DMA_BUF_IOCTL_SYNC callbacks that wrap cpu access.
dma-buf imports from foreign devices need to never use compressed PAT
for writing. Should KMD enforce this? Implicitly? Explicitly? I don't
see how UMD could know whether the imported dma-buf is from a foreign
device.

For mmaps of buffer objects of local device a resolve is needed. Having
KMD do that on a pagefault-basis is definitely possible, but will most
likely be terribly inefficient. Better to leave that to UMD?

/Thomas

> 
> > 
> > What do you think?  Viable?  Crazy?  Have I missed something?
> > 
> > --Ken