[Intel-gfx] [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()

Tue Oct 18 08:22:56 UTC 2016

On 18/10/2016 09:17, Chris Wilson wrote:
> On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
>> On 17/10/2016 15:10, Chris Wilson wrote:
>>> When handling execbuf relocations, we play a delicate dance with
>>> pagefault. We first try to access the user pages underneath our
>>> struct_mutex. However, if those pages were inside a GEM object, we may
>>> trigger a pagefault and deadlock as i915_gem_fault() tries to
>>> recursively acquire struct_mutex. Instead, we choose to disable
>>> pagefaulting around the copy_from_user whilst inside the struct_mutex
>>> and handle the EFAULT by falling back to a copy outside the
>>> struct_mutex.
>>>
>>> We however presumed that disabling pagefaults would be expensive. It is
>>> just an operation on the local current task. Cheap enough that we can
>>> restrict the disable/enable to the critical section around the copy, and
>>> so avoid having to handle the atomic sections within the relocation
>>> handling itself.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
>>>   1 file changed, 28 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> index 1d02e74ce62d..22342ad0e07f 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> @@ -551,20 +551,6 @@ repeat:
>>>   	return 0;
>>>   }
>>> -static bool object_is_idle(struct drm_i915_gem_object *obj)
>>> -{
>>> -	unsigned long active = i915_gem_object_get_active(obj);
>>> -	int idx;
>>> -
>>> -	for_each_active(active, idx) {
>>> -		if (!i915_gem_active_is_idle(&obj->last_read[idx],
>>> -					     &obj->base.dev->struct_mutex))
>>> -			return false;
>>> -	}
>>> -
>>> -	return true;
>>> -}
>>> -
>>>   static int
>>>   i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>   				   struct eb_vmas *eb,
>>> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>   		return -EINVAL;
>>>   	}
>>> -	/* We can't wait for rendering with pagefaults disabled */
>>> -	if (pagefault_disabled() && !object_is_idle(obj))
>>> -		return -EFAULT;
>>> -
>>>   	ret = relocate_entry(obj, reloc, cache, target_offset);
>>>   	if (ret)
>>>   		return ret;
>>> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>   	remain = entry->relocation_count;
>>>   	while (remain) {
>>>   		struct drm_i915_gem_relocation_entry *r = stack_reloc;
>>> -		int count = remain;
>>> -		if (count > ARRAY_SIZE(stack_reloc))
>>> -			count = ARRAY_SIZE(stack_reloc);
>>> +		unsigned long unwritten;
>>> +		unsigned int count;
>>> +
>>> +		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>>>   		remain -= count;
>>> -		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
>>> +		/* This is the fast path and we cannot handle a pagefault
>>> +		 * whilst holding the struct mutex lest the user pass in the
>>> +		 * relocations contained within a mmaped bo. For in such a case
>>> +		 * we, the page fault handler would call i915_gem_fault() and
>>> +		 * we would try to acquire the struct mutex again. Obviously
>>> +		 * this is bad and so lockdep complains vehemently.
>>> +		 */
>>> +		pagefault_disable();
>>> +		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
>>> +		pagefault_enable();
>>> +		if (unwritten) {
>>>   			ret = -EFAULT;
>>>   			goto out;
>>>   		}
>>> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>   			if (ret)
>>>   				goto out;
>>> -			if (r->presumed_offset != offset &&
>>> -			    __put_user(r->presumed_offset,
>>> -				       &user_relocs->presumed_offset)) {
>>> -				ret = -EFAULT;
>>> -				goto out;
>>> +			if (r->presumed_offset != offset) {
>>> +				/* Copying back to the user is allowed to fail.
>>> +				 * The information passed back is a hint as
>>> +				 * to the final location. If the copy_to_user
>>> +				 * fails after a successful copy_from_user,
>>> +				 * it must be a readonly location and so
>>> +				 * we presume the user knows what they are
>>> +				 * doing!
>>> +				 */
>>> +				pagefault_disable();
>>> +				__put_user(r->presumed_offset,
>>> +					   &user_relocs->presumed_offset);
>>> +				pagefault_enable();
>> Why is a good idea to ignore potential errors here?
> Wrong question: why did we think it a good idea to ignore success here?

How were we ignoring errors, I don't follow?

> (a) it is safe to do so, and I can legitimately setup userspace to use
> this

How what where? :)

> (b) reporting an error after we have committed the change is broken
> anyway.

You mean in the half way through cases?

Regards,

Tvrkto