[Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
Chris Wilson
chris at chris-wilson.co.uk
Mon Jan 27 19:09:04 CET 2014
On Mon, Jan 27, 2014 at 09:56:12AM -0800, Volkin, Bradley D wrote:
> > +static void
> > +i915_mmu_notifier_del(struct i915_mmu_notifier *mmu,
> > + struct i915_mmu_object *mn)
> > +{
> > + bool destroy;
> > +
> > + spin_lock(&mmu->lock);
> > + interval_tree_remove(&mn->it, &mmu->objects);
> > + destroy = --mmu->count == 0;
> > + __i915_mmu_notifier_update_serial(mmu);
> > + spin_unlock(&mmu->lock);
> > +
> > + if (destroy) /* protected against _add() by struct_mutex */
> > + __i915_mmu_notifier_destroy(mmu);
>
> I see that we should hold struct_mutex when this function is called,
> but I don't see that we try to get the mutex anywhere on the _add() path.
> Have I missed something?
No, it's fixed in a later patch. I assumed I had the lock taken in the
outermost ioctl routine.
> > +free_mn:
> > + kfree(mn);
> > +destroy_mmu:
> > + if (mmu->count == 0)
> > + __i915_mmu_notifier_destroy(mmu);
>
> Other accesses to mmu->count are protected by mmu->lock. Again, I may have missed
> something but don't immediately see why that's not required.
mmu->count is protected by struct_mutex. See above.
> > + if (pinned < num_pages) {
> > + if (pinned < 0) {
> > + ret = pinned;
> > + pinned = 0;
> > + } else {
> > + /* Spawn a worker so that we can acquire the
> > + * user pages without holding our mutex.
> > + */
> > + ret = -EAGAIN;
> > + if (obj->userptr.work == NULL) {
> > + struct get_pages_work *work;
> > +
> > + work = kmalloc(sizeof(*work), GFP_KERNEL);
> > + if (work != NULL) {
> > + obj->userptr.work = &work->work;
> > +
> > + work->obj = obj;
> > + drm_gem_object_reference(&obj->base);
> > +
> > + work->task = current;
> > + get_task_struct(work->task);
> > +
> > + INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
> > + schedule_work(&work->work);
>
> Any reason to use the system wq instead of the driver wq here?
> It doesn't look like it's the usual "takes modeset locks" justification.
Performance. Using the driver workqueue would serialise the clients, but
using the system workqueue we can do the pagefaulting in parallel.
>
> > + } else
> > + ret = -ENOMEM;
> > + } else {
> > + if (IS_ERR(obj->userptr.work)) {
>
> } else if (...) { ?
No, I think it is clearer as is.
> > + ret = PTR_ERR(obj->userptr.work);
> > + obj->userptr.work = NULL;
> > + }
> > + }
> > + }
> > + } else {
> > + struct sg_table *st;
> > +
> > + st = kmalloc(sizeof(*st), GFP_KERNEL);
> > + if (st == NULL || sg_alloc_table(st, num_pages, GFP_KERNEL)) {
> > + kfree(st);
> > + ret = -ENOMEM;
> > + } else {
> > + struct scatterlist *sg;
> > + int n;
> > +
> > + for_each_sg(st->sgl, sg, num_pages, n)
> > + sg_set_page(sg, pvec[n], PAGE_SIZE, 0);
> > +
> > + obj->pages = st;
> > + obj->userptr.work = NULL;
> > +
> > + pinned = 0;
> > + ret = 0;
> > + }
>
> This block is almost identical to code in the worker. Would it be worth extracting
> the common parts into a helper?
Almost, but subtly and importantly different. Only the loop was the same
at which point I didn't feel like the saving was significant. It is now
even less similar.
> > + }
> > +
> > + release_pages(pvec, pinned, 0);
> > + drm_free_large(pvec);
> > + return ret;
> > +}
> > +
> > +static void
> > +i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
> > +{
> > + struct scatterlist *sg;
> > + int i;
> > +
> > + if (obj->madv != I915_MADV_WILLNEED)
> > + obj->dirty = 0;
>
> This is subtly different than similar code in the standard put_pages() in that
> it sets dirty=0 for both DONTNEED and PURGED instead of just DONTNEED (w/ BUG_ON(PURGED)).
> I don't think we will ever actually truncate userptr objects, so is there any reason for
> this to be different?
No, there's no reason for the difference. Semantically it is the same, of
course.
> > +/**
> > + * Creates a new mm object that wraps some normal memory from the process
> > + * context - user memory.
> > + *
> > + * We impose several restrictions upon the memory being mapped
> > + * into the GPU.
> > + * 1. It must be page aligned (both start/end addresses, i.e ptr and size).
> > + * 2. We only allow a bo as large as we could in theory map into the GTT,
> > + * that is we limit the size to the total size of the GTT.
> > + * 3. The bo is marked as being snoopable. The backing pages are left
> > + * accessible directly by the CPU, but reads by the GPU may incur the cost
> > + * of a snoop (unless you have an LLC architecture).
>
> No overlapping ranges
Yes, that is an important addition.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list