[Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

Daniel Vetter daniel.vetter at ffwll.ch
Tue Aug 3 15:09:15 UTC 2021


On Wed, Jul 28, 2021 at 4:22 PM Matthew Auld
<matthew.william.auld at gmail.com> wrote:
>
> On Mon, 26 Jul 2021 at 17:10, Tvrtko Ursulin
> <tvrtko.ursulin at linux.intel.com> wrote:
> >
> >
> > On 26/07/2021 16:14, Jason Ekstrand wrote:
> > > On Mon, Jul 26, 2021 at 3:31 AM Maarten Lankhorst
> > > <maarten.lankhorst at linux.intel.com> wrote:
> > >>
> > >> Op 23-07-2021 om 13:34 schreef Matthew Auld:
> > >>> From: Chris Wilson <chris at chris-wilson.co.uk>
> > >>>
> > >>> Jason Ekstrand requested a more efficient method than userptr+set-domain
> > >>> to determine if the userptr object was backed by a complete set of pages
> > >>> upon creation. To be more efficient than simply populating the userptr
> > >>> using get_user_pages() (as done by the call to set-domain or execbuf),
> > >>> we can walk the tree of vm_area_struct and check for gaps or vma not
> > >>> backed by struct page (VM_PFNMAP). The question is how to handle
> > >>> VM_MIXEDMAP which may be either struct page or pfn backed...
> > >>>
> > >>> With discrete we are going to drop support for set_domain(), so offering
> > >>> a way to probe the pages, without having to resort to dummy batches has
> > >>> been requested.
> > >>>
> > >>> v2:
> > >>> - add new query param for the PROBE flag, so userspace can easily
> > >>>    check if the kernel supports it(Jason).
> > >>> - use mmap_read_{lock, unlock}.
> > >>> - add some kernel-doc.
> > >>> v3:
> > >>> - In the docs also mention that PROBE doesn't guarantee that the pages
> > >>>    will remain valid by the time they are actually used(Tvrtko).
> > >>> - Add a small comment for the hole finding logic(Jason).
> > >>> - Move the param next to all the other params which just return true.
> > >>>
> > >>> Testcase: igt/gem_userptr_blits/probe
> > >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > >>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> > >>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > >>> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> > >>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> > >>> Cc: Jordan Justen <jordan.l.justen at intel.com>
> > >>> Cc: Kenneth Graunke <kenneth at whitecape.org>
> > >>> Cc: Jason Ekstrand <jason at jlekstrand.net>
> > >>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > >>> Cc: Ramalingam C <ramalingam.c at intel.com>
> > >>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > >>> Acked-by: Kenneth Graunke <kenneth at whitecape.org>
> > >>> Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> > >>> ---
> > >>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 ++++++++++++++++++++-
> > >>>   drivers/gpu/drm/i915/i915_getparam.c        |  1 +
> > >>>   include/uapi/drm/i915_drm.h                 | 20 ++++++++++
> > >>>   3 files changed, 61 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > >>> index 56edfeff8c02..468a7a617fbf 100644
> > >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > >>> @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
> > >>>
> > >>>   #endif
> > >>>
> > >>> +static int
> > >>> +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > >>> +{
> > >>> +     const unsigned long end = addr + len;
> > >>> +     struct vm_area_struct *vma;
> > >>> +     int ret = -EFAULT;
> > >>> +
> > >>> +     mmap_read_lock(mm);
> > >>> +     for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > >>> +             /* Check for holes, note that we also update the addr below */
> > >>> +             if (vma->vm_start > addr)
> > >>> +                     break;
> > >>> +
> > >>> +             if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > >>> +                     break;
> > >>> +
> > >>> +             if (vma->vm_end >= end) {
> > >>> +                     ret = 0;
> > >>> +                     break;
> > >>> +             }
> > >>> +
> > >>> +             addr = vma->vm_end;
> > >>> +     }
> > >>> +     mmap_read_unlock(mm);
> > >>> +
> > >>> +     return ret;
> > >>> +}
> > >>> +
> > >>>   /*
> > >>>    * Creates a new mm object that wraps some normal memory from the process
> > >>>    * context - user memory.
> > >>> @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > >>>        }
> > >>>
> > >>>        if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > >>> -                         I915_USERPTR_UNSYNCHRONIZED))
> > >>> +                         I915_USERPTR_UNSYNCHRONIZED |
> > >>> +                         I915_USERPTR_PROBE))
> > >>>                return -EINVAL;
> > >>>
> > >>>        if (i915_gem_object_size_2big(args->user_size))
> > >>> @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > >>>                        return -ENODEV;
> > >>>        }
> > >>>
> > >>> +     if (args->flags & I915_USERPTR_PROBE) {
> > >>> +             /*
> > >>> +              * Check that the range pointed to represents real struct
> > >>> +              * pages and not iomappings (at this moment in time!)
> > >>> +              */
> > >>> +             ret = probe_range(current->mm, args->user_ptr, args->user_size);
> > >>> +             if (ret)
> > >>> +                     return ret;
> > >>> +     }
> > >>> +
> > >>>   #ifdef CONFIG_MMU_NOTIFIER
> > >>>        obj = i915_gem_object_alloc();
> > >>>        if (obj == NULL)
> > >>> diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c
> > >>> index 24e18219eb50..bbb7cac43eb4 100644
> > >>> --- a/drivers/gpu/drm/i915/i915_getparam.c
> > >>> +++ b/drivers/gpu/drm/i915/i915_getparam.c
> > >>> @@ -134,6 +134,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
> > >>>        case I915_PARAM_HAS_EXEC_FENCE_ARRAY:
> > >>>        case I915_PARAM_HAS_EXEC_SUBMIT_FENCE:
> > >>>        case I915_PARAM_HAS_EXEC_TIMELINE_FENCES:
> > >>> +     case I915_PARAM_HAS_USERPTR_PROBE:
> > >>>                /* For the time being all of these are always true;
> > >>>                 * if some supported hardware does not have one of these
> > >>>                 * features this value needs to be provided from
> > >>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > >>> index 975087553ea0..0d290535a6e5 100644
> > >>> --- a/include/uapi/drm/i915_drm.h
> > >>> +++ b/include/uapi/drm/i915_drm.h
> > >>> @@ -674,6 +674,9 @@ typedef struct drm_i915_irq_wait {
> > >>>    */
> > >>>   #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 55
> > >>>
> > >>> +/* Query if the kernel supports the I915_USERPTR_PROBE flag. */
> > >>> +#define I915_PARAM_HAS_USERPTR_PROBE 56
> > >>> +
> > >>>   /* Must be kept compact -- no holes and well documented */
> > >>>
> > >>>   typedef struct drm_i915_getparam {
> > >>> @@ -2222,12 +2225,29 @@ struct drm_i915_gem_userptr {
> > >>>         * through the GTT. If the HW can't support readonly access, an error is
> > >>>         * returned.
> > >>>         *
> > >>> +      * I915_USERPTR_PROBE:
> > >>> +      *
> > >>> +      * Probe the provided @user_ptr range and validate that the @user_ptr is
> > >>> +      * indeed pointing to normal memory and that the range is also valid.
> > >>> +      * For example if some garbage address is given to the kernel, then this
> > >>> +      * should complain.
> > >>> +      *
> > >>> +      * Returns -EFAULT if the probe failed.
> > >>> +      *
> > >>> +      * Note that this doesn't populate the backing pages, and also doesn't
> > >>> +      * guarantee that the object will remain valid when the object is
> > >>> +      * eventually used.
> > >>> +      *
> > >>> +      * The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE
> > >>> +      * returns a non-zero value.
> > >>> +      *
> > >>>         * I915_USERPTR_UNSYNCHRONIZED:
> > >>>         *
> > >>>         * NOT USED. Setting this flag will result in an error.
> > >>>         */
> > >>>        __u32 flags;
> > >>>   #define I915_USERPTR_READ_ONLY 0x1
> > >>> +#define I915_USERPTR_PROBE 0x2
> > >>>   #define I915_USERPTR_UNSYNCHRONIZED 0x80000000
> > >>>        /**
> > >>>         * @handle: Returned handle for the object.
> > >>
> > >> Could we use _VALIDATE instead of probe? Or at least pin the pages as well, so we don't have to do it later?
> > >
> > > I only care that the name matches what it does.  _VALIDATE sounds like
> > > it does a full validation of everything such that, if the import
> > > succeeds, execbuf will as well.  If we pin the pages at the same time,
> > > maybe that's true?  _PROBE, on the other hand, sounds a lot more like
> >
> > No it is not possible to guarantee backing store remains valid until
> > execbuf.
> >
> > > a one-time best-effort check which may race with other stuff and
> > > doesn't guarantee future success.  That's in line with what the
> > > current patch does.
> > >
> > >> We already have i915_gem_object_userptr_validate, no need to dupe it.
> > >
> > > I have no opinion on this.
> >
> > I was actually suggesting the same as Maarten here - that we should add
> > a "populate" flag. But opinion was that was not desired - please look
> > for the older threads to see the reasoning there.
>
> So how should we proceed here? Maarten?

I honestly don't care, and I think the probe flag here is perfectly
fine. Reasons for that:
- we don't have an immediate allocation flag for buffer creation
either. So if there's a need we need a flag for this across the board,
not just userptr, and a clear userspace ask
- it's a fundamentally racy test anyway, userspace can munmap or map
something else and then it will fail. So we really don't gain anything
by pinning pages because by the time we go into execbuf they might be
invalidated already - checking the vmas for VM_SPECIAL is perfectly
good enough.
- we can always change the implementation later on too.

Hence why I think PROBE is the semantics we want/need here. Can we get
some acks/reviews here or is this really a significant enough bikeshed
that we need to hold up dg1 pciids for them?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list