[Intel-gfx] [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()

Tue Oct 18 08:54:23 UTC 2016

On 18/10/2016 09:38, Chris Wilson wrote:
> On Tue, Oct 18, 2016 at 09:22:56AM +0100, Tvrtko Ursulin wrote:
>> On 18/10/2016 09:17, Chris Wilson wrote:
>>> On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
>>>> On 17/10/2016 15:10, Chris Wilson wrote:
>>>>> When handling execbuf relocations, we play a delicate dance with
>>>>> pagefault. We first try to access the user pages underneath our
>>>>> struct_mutex. However, if those pages were inside a GEM object, we may
>>>>> trigger a pagefault and deadlock as i915_gem_fault() tries to
>>>>> recursively acquire struct_mutex. Instead, we choose to disable
>>>>> pagefaulting around the copy_from_user whilst inside the struct_mutex
>>>>> and handle the EFAULT by falling back to a copy outside the
>>>>> struct_mutex.
>>>>>
>>>>> We however presumed that disabling pagefaults would be expensive. It is
>>>>> just an operation on the local current task. Cheap enough that we can
>>>>> restrict the disable/enable to the critical section around the copy, and
>>>>> so avoid having to handle the atomic sections within the relocation
>>>>> handling itself.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
>>>>>   1 file changed, 28 insertions(+), 36 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> index 1d02e74ce62d..22342ad0e07f 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> @@ -551,20 +551,6 @@ repeat:
>>>>>   	return 0;
>>>>>   }
>>>>> -static bool object_is_idle(struct drm_i915_gem_object *obj)
>>>>> -{
>>>>> -	unsigned long active = i915_gem_object_get_active(obj);
>>>>> -	int idx;
>>>>> -
>>>>> -	for_each_active(active, idx) {
>>>>> -		if (!i915_gem_active_is_idle(&obj->last_read[idx],
>>>>> -					     &obj->base.dev->struct_mutex))
>>>>> -			return false;
>>>>> -	}
>>>>> -
>>>>> -	return true;
>>>>> -}
>>>>> -
>>>>>   static int
>>>>>   i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>>>   				   struct eb_vmas *eb,
>>>>> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>>>   		return -EINVAL;
>>>>>   	}
>>>>> -	/* We can't wait for rendering with pagefaults disabled */
>>>>> -	if (pagefault_disabled() && !object_is_idle(obj))
>>>>> -		return -EFAULT;
>>>>> -
>>>>>   	ret = relocate_entry(obj, reloc, cache, target_offset);
>>>>>   	if (ret)
>>>>>   		return ret;
>>>>> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>>>   	remain = entry->relocation_count;
>>>>>   	while (remain) {
>>>>>   		struct drm_i915_gem_relocation_entry *r = stack_reloc;
>>>>> -		int count = remain;
>>>>> -		if (count > ARRAY_SIZE(stack_reloc))
>>>>> -			count = ARRAY_SIZE(stack_reloc);
>>>>> +		unsigned long unwritten;
>>>>> +		unsigned int count;
>>>>> +
>>>>> +		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>>>>>   		remain -= count;
>>>>> -		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
>>>>> +		/* This is the fast path and we cannot handle a pagefault
>>>>> +		 * whilst holding the struct mutex lest the user pass in the
>>>>> +		 * relocations contained within a mmaped bo. For in such a case
>>>>> +		 * we, the page fault handler would call i915_gem_fault() and
>>>>> +		 * we would try to acquire the struct mutex again. Obviously
>>>>> +		 * this is bad and so lockdep complains vehemently.
>>>>> +		 */
>>>>> +		pagefault_disable();
>>>>> +		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
>>>>> +		pagefault_enable();
>>>>> +		if (unwritten) {
>>>>>   			ret = -EFAULT;
>>>>>   			goto out;
>>>>>   		}
>>>>> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>>>   			if (ret)
>>>>>   				goto out;
>>>>> -			if (r->presumed_offset != offset &&
>>>>> -			    __put_user(r->presumed_offset,
>>>>> -				       &user_relocs->presumed_offset)) {
>>>>> -				ret = -EFAULT;
>>>>> -				goto out;
>>>>> +			if (r->presumed_offset != offset) {
>>>>> +				/* Copying back to the user is allowed to fail.
>>>>> +				 * The information passed back is a hint as
>>>>> +				 * to the final location. If the copy_to_user
>>>>> +				 * fails after a successful copy_from_user,
>>>>> +				 * it must be a readonly location and so
>>>>> +				 * we presume the user knows what they are
>>>>> +				 * doing!
>>>>> +				 */
>>>>> +				pagefault_disable();
>>>>> +				__put_user(r->presumed_offset,
>>>>> +					   &user_relocs->presumed_offset);
>>>>> +				pagefault_enable();
>>>> Why is a good idea to ignore potential errors here?
>>> Wrong question: why did we think it a good idea to ignore success here?
>> How were we ignoring errors, I don't follow?
>>
>>> (a) it is safe to do so, and I can legitimately setup userspace to use
>>> this
>> How what where? :)
> igt, whereelse, has a test case to check we can execute from read only
> memory.

With relocations?

>   
>>> (b) reporting an error after we have committed the change is broken
>>> anyway.
>> You mean in the half way through cases?
> We've already made the user/gpu visible change. Failing here is broken.
> (User doesn't notice the change, presumed_offset doesn't match actual,
> kernel is oblivious to mismatch, GPU hangs.)

How is that? I see a error path propagating all the way up to failing 
the execbuf.

Regards,

Tvrtko