[RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI memory barrier

Wed Dec 18 12:59:28 UTC 2024

> -----Original Message-----
> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Mrozek,
> Michal
> Sent: Tuesday, November 19, 2024 6:12 PM
> To: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>; Christian König
> <christian.koenig at amd.com>; Brost, Matthew <matthew.brost at intel.com>;
> dri-devel at lists.freedesktop.org; intel-xe at lists.freedesktop.org
> Cc: Graunke, Kenneth W <kenneth.w.graunke at intel.com>; Landwerlin, Lionel
> G <lionel.g.landwerlin at intel.com>; Souza, Jose <jose.souza at intel.com>;
> simona.vetter at ffwll.ch; thomas.hellstrom at linux.intel.com;
> boris.brezillon at collabora.com; airlied at gmail.com;
> mihail.atanassov at arm.com; steven.price at arm.com;
> shashank.sharma at amd.com
> Subject: RE: [RFC PATCH 13/29] drm/xe/mmap: Add mmap support for PCI
> memory barrier
> 
> "Adding Michal from the compute userspace team for sharing references to
> the code.
> 
> Quoting Christian König (2024-11-19 12:00:44)
> > Am 19.11.24 um 00:37 schrieb Matthew Brost:
> > > From: Tejas Upadhyay <tejas.upadhyay at intel.com>
> > >
> > > In order to avoid having userspace to use MI_MEM_FENCE, we are
> > > adding a mechanism for userspace to generate a PCI memory barrier
> > > with low overhead (avoiding IOCTL call as well as writing to VRAM
> > > will adds some overhead).
> > >
> > > This is implemented by memory-mapping a page as uncached that is
> > > backed by MMIO on the dGPU and thus allowing userspace to do memory
> > > write to the page without invoking an IOCTL.
> > > We are selecting the MMIO so that it is not accessible from the PCI
> > > bus so that the MMIO writes themselves are ignored, but the PCI
> > > memory barrier will still take action as the MMIO filtering will
> > > happen after the memory barrier effect.
> > >
> > > When we detect special defined offset in mmap(), We are mapping 4K
> > > page which contains the last of page of doorbell MMIO range to
> > > userspace for same purpose.
> >
> > Well that is quite a hack, but don't you still need a memory barrier
> > instruction? E.g. m_fence?
> 
> I guess you refer on the userspace usage directions? Yeah, the userspace
> definitely has to make sure that the write actually propagated to the PCI bus
> before they can assume the serialization to happen on the GPU. I think the
> userspace folks should be able to explain how exactly the orchestrate that.
> Michal, can you or somebody else share the respective lines of code in the
> userspace driver?
> 
> At this time, the userspace only enables this on X86, but could also support
> other more exotic platforms via libpciaccess.
> 
> > And why don't you expose the real doorbell instead of the last
> > (unused?) page of the MMIO region?
> 
> Doorbells are a complete red herring here.
> 
> Chosen page just happens to be a full 4K MMIO page where any writes
> coming over PCI bus get dropped (and reads return zero) by the GPU. Such
> dummy (from CPU point of view) 4K MMIO page allows doing a CPU write
> that generates a PCI bus transaction, where the transaction itself is essentially
> a NOP. But as the transaction falls into the MMIO address range, it will trigger a
> serialization of the incoming traffic in the GPU side, before being ignored.
> 
> Regards, Joonas
> "
> 
> Here is appropriate path:
> https://github.com/intel/compute-
> runtime/blob/f589408848128434e410b6b4c2a9107ff78a74e9/shared/sou
> rce/direct_submission/direct_submission_hw.inl#L437
> 
> flow is as follows:
> 1. do updates to shared memory between CPU/GPU using WC memory
> mapping 2. emit sfence instruction to make sure there is no reordering on the
> CPU side 3. emit pciBarrier write (this patch) , this ensures that all earlier
> transactions are properly ordered from the GPU side
> 
> So PCI memory barrier is submitted after sfence instruction and that makes
> sure that all earlier transactions are properly ordered.
> 
> Michal

https://patchwork.freedesktop.org/patch/629628/ is separate reviewed submission intended for merge standalone. It will be merged if there are no objections.

Thanks,
Tejas
>