[PATCH] drm/xe/mmap: Add mmap support for PCI memory barrier

Wed Oct 9 13:56:50 UTC 2024

Hi,

On 09/10/2024 11:10, Tejas Upadhyay wrote:
> In order to avoid having userspace to use MI_MEM_FENCE,
> we are adding a mechanism for userspace to generate a
> PCI memory barrier with low overhead (avoiding IOCTL call).

I guess it doesn't work if we just do VRAM write from the host side?

> 
> This is implemented by memory-mapping a page as uncached
> that is backed by MMIO on the dGPU and thus allowing userspace
> to do memory write to the page without invoking an IOCTL.
> We are selecting the MMIO so that it is not accessible from
> the PCI bus so that the MMIO writes themselves are ignored,
> but the PCI memory barrier will still take action as the MMIO
> filtering will happen after the memory barrier effect.
> 
> When we detect special defined offset in mmap(), We are mapping
> 4K page which contains the last of page of doorbell MMIO range
> to userspace for same purpose.
> 
> Note: Test coverage for this is added by IGT
>        https://patchwork.freedesktop.org/patch/618931/  here.
> 
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>

Needs some UMD Cc. This was requested by some UMD?

> ---
>   drivers/gpu/drm/xe/xe_device.c | 68 +++++++++++++++++++++++++++++++++-
>   1 file changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index cd241a8e1838..c97a4d1f0a98 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -239,12 +239,78 @@ static long xe_drm_compat_ioctl(struct file *file, unsigned int cmd, unsigned lo
>   #define xe_drm_compat_ioctl NULL
>   #endif
>   
> +static void barrier_open(struct vm_area_struct *vma)
> +{
> +	drm_dev_get(vma->vm_private_data);
> +}
> +
> +static void barrier_close(struct vm_area_struct *vma)
> +{
> +	drm_dev_put(vma->vm_private_data);
> +}
> +
> +static const struct vm_operations_struct vm_ops_barrier = {
> +	.open = barrier_open,
> +	.close = barrier_close,
> +};
> +
> +static int xe_pci_barrier_mmap(struct file *filp,
> +			       struct vm_area_struct *vma)
> +{
> +	struct drm_file *priv = filp->private_data;
> +	struct drm_device *dev = priv->minor->dev;
> +	unsigned long pfn;
> +	pgprot_t prot;
> +
> +	if (vma->vm_end - vma->vm_start > PAGE_SIZE)
> +		return -EINVAL;
> +
> +	if (is_cow_mapping(vma->vm_flags))
> +		return -EINVAL;
> +
> +	if (vma->vm_flags & (VM_READ | VM_EXEC))
> +		return -EINVAL;
> +
> +	vm_flags_clear(vma, VM_MAYREAD | VM_MAYEXEC);
> +	vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO);
> +
> +	prot = vm_get_page_prot(vma->vm_flags);
> +#define LAST_DB_PAGE_OFFSET 0x7ff001
> +	pfn = PHYS_PFN(pci_resource_start(to_pci_dev(dev->dev), 0) +
> +			LAST_DB_PAGE_OFFSET);
> +	if (vmf_insert_pfn_prot(vma, vma->vm_start, pfn,
> +				pgprot_noncached(prot)) != VM_FAULT_NOPAGE)
> +		return -EFAULT;
> +
> +	vma->vm_ops = &vm_ops_barrier;
> +	vma->vm_private_data = dev;
> +	drm_dev_get(vma->vm_private_data);

What is going on here with pci power states? Say device goes into d3hot 
or d3cold and you touch this mmio, what happens? Do you get pci errors 
in dmesg? Does rpm need to get involved here? Might be good to have a 
testcase for this.

> +	return 0;
> +}
> +
> +static int xe_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +	struct drm_file *priv = filp->private_data;
> +	struct drm_device *dev = priv->minor->dev;
> +
> +	if (drm_dev_is_unplugged(dev))
> +		return -ENODEV;

With hotunplug I think there is also a special dummy page that gets 
faulted in if we detect that we are unplugged in the fault handler to 
avoid sigbus killing stuff (see xe_gem_fault()). Also I guess all mmaps 
are nuked on unplug so everything gets refaulted. But this doesn't look 
to use fault handler and just pre-populates it here, so wondering what 
would happen here?

> +
> +#define XE_PCI_BARRIER_MMAP_OFFSET (0x50 << PAGE_SHIFT)

This needs to go in uapi header I think?

> +	switch (vma->vm_pgoff) {

I think pgoff is the fake offset to be able to lookup the bo, so for 
this to work I think you would need to reserve this page somehow (if we 
go with this approach), otherwise user can easily end up mmapping a 
normal bo and get back a fake offset that happens to be this exact 
special barrier offset and then things go boom.

Another idea could be to have mmap_offset called with special flag to 
indicate this BARRIER thing, and then it returns the magic reserved 
offset. That way we don't need to hardcode XE_PCI_BARRIER_MMAP_OFFSET in 
the uapi, and be forever stuck with it.

> +	case XE_PCI_BARRIER_MMAP_OFFSET >> PAGE_SHIFT:
> +		return xe_pci_barrier_mmap(filp, vma);
> +	}
> +
> +	return drm_gem_mmap(filp, vma);
> +}
> +
>   static const struct file_operations xe_driver_fops = {
>   	.owner = THIS_MODULE,
>   	.open = drm_open,
>   	.release = drm_release_noglobal,
>   	.unlocked_ioctl = xe_drm_ioctl,
> -	.mmap = drm_gem_mmap,
> +	.mmap = xe_mmap,
>   	.poll = drm_poll,
>   	.read = drm_read,
>   	.compat_ioctl = xe_drm_compat_ioctl,