[PATCH] drm/xe/mmap: Add mmap support for PCI memory barrier

Thu Oct 17 09:46:36 UTC 2024

(+ Thomas and Lucas)

Quoting Joonas Lahtinen (2024-10-17 11:26:13)
> Quoting Matthew Auld (2024-10-16 19:23:30)
> > On 16/10/2024 17:06, Upadhyay, Tejas wrote:
> > >> On 16/10/2024 16:13, Upadhyay, Tejas wrote:
> 
> <SNIP>
> 
> > >> IMO we should at least consider not hard coding the offset in the uapi, but
> > >> maybe just return it from mmap_offset when passing a special flag, with no
> > >> hard guarentee that the offset can't change if kmd so chooses?
> > >> What do you think?
> > >>
> > >> Also if we want to keep the magic 0x50 instead of allocating an actual offset it
> > >> might be good to add a build check against DRM_FILE_PAGE_OFFSET_START
> > >> somewhere?
> > > 
> > > Build check looks good to me still let me go through both option tomorrow and get
> > > back with V2 on this. I will get this in separate patch.
> 
> We seem to be looking at the quick'n dirty downstream i915 uAPI being ported
> directly.
> 
> What was originally planned for proper upstreaming of the uAPI was to have a
> query interface that would return both the mmap offset and per-client page offset
> within that page (cacheline per DRM client). This interface would be most future-proof
> for HW changes.

I was friendly reminded that I forgot about the fact that just a single
cacheline per DRM client causes contention for any multi-threaded userspace.
And as this feature in itself is mostly an advanced optimization, it
makes sense to not apply that restriction.

What downstream i915 ended up doing is allowing userspace to randomize
within the page, so that different threads from same process do not land
on same cachelines.

Adding cacheline range per DRM client feels a bit complex just to future-proof the
uAPI at this point given with query we should be able to add _V2 variant rather
cheaply if the need to pass more details in the future.

So I guess we could go either go with the more simple option of implicit randomization
within the page at this point, or a more complex query API where we return [start, end]
cacheline range (which would be full page on current known platforms).

Thomas or Lucas, any preference?

Regards, Joonas

> And if that query returns a failure, userspace would have to revert back to alternative
> mechanism (either kernel doesn't support this feature, or it got disabled). That
> fallback code should exist and be validated and reviewed in the userspace code
> before merging.
> 
> Regards, Joonas