[Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

Mon Jan 27 19:09:04 CET 2014

On Mon, Jan 27, 2014 at 09:56:12AM -0800, Volkin, Bradley D wrote:
> > +static void
> > +i915_mmu_notifier_del(struct i915_mmu_notifier *mmu,
> > +		      struct i915_mmu_object *mn)
> > +{
> > +	bool destroy;
> > +
> > +	spin_lock(&mmu->lock);
> > +	interval_tree_remove(&mn->it, &mmu->objects);
> > +	destroy = --mmu->count == 0;
> > +	__i915_mmu_notifier_update_serial(mmu);
> > +	spin_unlock(&mmu->lock);
> > +
> > +	if (destroy) /* protected against _add() by struct_mutex */
> > +		__i915_mmu_notifier_destroy(mmu);
> 
> I see that we should hold struct_mutex when this function is called,
> but I don't see that we try to get the mutex anywhere on the _add() path.
> Have I missed something?

No, it's fixed in a later patch. I assumed I had the lock taken in the
outermost ioctl routine.

> > +free_mn:
> > +	kfree(mn);
> > +destroy_mmu:
> > +	if (mmu->count == 0)
> > +		__i915_mmu_notifier_destroy(mmu);
> 
> Other accesses to mmu->count are protected by mmu->lock. Again, I may have missed
> something but don't immediately see why that's not required.

mmu->count is protected by struct_mutex. See above.

> > +	if (pinned < num_pages) {
> > +		if (pinned < 0) {
> > +			ret = pinned;
> > +			pinned = 0;
> > +		} else {
> > +			/* Spawn a worker so that we can acquire the
> > +			 * user pages without holding our mutex.
> > +			 */
> > +			ret = -EAGAIN;
> > +			if (obj->userptr.work == NULL) {
> > +				struct get_pages_work *work;
> > +
> > +				work = kmalloc(sizeof(*work), GFP_KERNEL);
> > +				if (work != NULL) {
> > +					obj->userptr.work = &work->work;
> > +
> > +					work->obj = obj;
> > +					drm_gem_object_reference(&obj->base);
> > +
> > +					work->task = current;
> > +					get_task_struct(work->task);
> > +
> > +					INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
> > +					schedule_work(&work->work);
> 
> Any reason to use the system wq instead of the driver wq here?
> It doesn't look like it's the usual "takes modeset locks" justification.

Performance. Using the driver workqueue would serialise the clients, but
using the system workqueue we can do the pagefaulting in parallel.
> 
> > +				} else
> > +					ret = -ENOMEM;
> > +			} else {
> > +				if (IS_ERR(obj->userptr.work)) {
> 
> } else if (...) { ?

No, I think it is clearer as is.

> > +					ret = PTR_ERR(obj->userptr.work);
> > +					obj->userptr.work = NULL;
> > +				}
> > +			}
> > +		}
> > +	} else {
> > +		struct sg_table *st;
> > +
> > +		st = kmalloc(sizeof(*st), GFP_KERNEL);
> > +		if (st == NULL || sg_alloc_table(st, num_pages, GFP_KERNEL)) {
> > +			kfree(st);
> > +			ret = -ENOMEM;
> > +		} else {
> > +			struct scatterlist *sg;
> > +			int n;
> > +
> > +			for_each_sg(st->sgl, sg, num_pages, n)
> > +				sg_set_page(sg, pvec[n], PAGE_SIZE, 0);
> > +
> > +			obj->pages = st;
> > +			obj->userptr.work = NULL;
> > +
> > +			pinned = 0;
> > +			ret = 0;
> > +		}
> 
> This block is almost identical to code in the worker. Would it be worth extracting
> the common parts into a helper?

Almost, but subtly and importantly different. Only the loop was the same
at which point I didn't feel like the saving was significant. It is now
even less similar.

> > +	}
> > +
> > +	release_pages(pvec, pinned, 0);
> > +	drm_free_large(pvec);
> > +	return ret;
> > +}
> > +
> > +static void
> > +i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
> > +{
> > +	struct scatterlist *sg;
> > +	int i;
> > +
> > +	if (obj->madv != I915_MADV_WILLNEED)
> > +		obj->dirty = 0;
> 
> This is subtly different than similar code in the standard put_pages() in that
> it sets dirty=0 for both DONTNEED and PURGED instead of just DONTNEED (w/ BUG_ON(PURGED)).
> I don't think we will ever actually truncate userptr objects, so is there any reason for
> this to be different?

No, there's no reason for the difference. Semantically it is the same, of
course.

> > +/**
> > + * Creates a new mm object that wraps some normal memory from the process
> > + * context - user memory.
> > + *
> > + * We impose several restrictions upon the memory being mapped
> > + * into the GPU.
> > + * 1. It must be page aligned (both start/end addresses, i.e ptr and size).
> > + * 2. We only allow a bo as large as we could in theory map into the GTT,
> > + *    that is we limit the size to the total size of the GTT.
> > + * 3. The bo is marked as being snoopable. The backing pages are left
> > + *    accessible directly by the CPU, but reads by the GPU may incur the cost
> > + *    of a snoop (unless you have an LLC architecture).
> 
> No overlapping ranges

Yes, that is an important addition.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre