[RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI memory barrier

Joonas Lahtinen joonas.lahtinen at linux.intel.com
Tue Nov 19 11:57:25 UTC 2024


Adding Michal from the compute userspace team for sharing references to
the code.

Quoting Christian König (2024-11-19 12:00:44)
> Am 19.11.24 um 00:37 schrieb Matthew Brost:
> > From: Tejas Upadhyay <tejas.upadhyay at intel.com>
> >
> > In order to avoid having userspace to use MI_MEM_FENCE,
> > we are adding a mechanism for userspace to generate a
> > PCI memory barrier with low overhead (avoiding IOCTL call
> > as well as writing to VRAM will adds some overhead).
> >
> > This is implemented by memory-mapping a page as uncached
> > that is backed by MMIO on the dGPU and thus allowing userspace
> > to do memory write to the page without invoking an IOCTL.
> > We are selecting the MMIO so that it is not accessible from
> > the PCI bus so that the MMIO writes themselves are ignored,
> > but the PCI memory barrier will still take action as the MMIO
> > filtering will happen after the memory barrier effect.
> >
> > When we detect special defined offset in mmap(), We are mapping
> > 4K page which contains the last of page of doorbell MMIO range
> > to userspace for same purpose.
> 
> Well that is quite a hack, but don't you still need a memory barrier 
> instruction? E.g. m_fence?

I guess you refer on the userspace usage directions? Yeah, the
userspace definitely has to make sure that the write actually propagated
to the PCI bus before they can assume the serialization to happen on the
GPU. I think the userspace folks should be able to explain how exactly
the orchestrate that. Michal, can you or somebody else share the respective
lines of code in the userspace driver?

At this time, the userspace only enables this on X86, but could also
support other more exotic platforms via libpciaccess.

> And why don't you expose the real doorbell instead of the last (unused?) 
> page of the MMIO region?

Doorbells are a complete red herring here. 

Chosen page just happens to be a full 4K MMIO page where any writes coming over
PCI bus get dropped (and reads return zero) by the GPU. Such dummy (from CPU point
of view) 4K MMIO page allows doing a CPU write that generates a PCI bus transaction,
where the transaction itself is essentially a NOP. But as the transaction falls into
the MMIO address range, it will trigger a serialization of the incoming traffic in
the GPU side, before being ignored.

Regards, Joonas


More information about the dri-devel mailing list