[Intel-gfx] [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Oct 18 08:54:23 UTC 2016
On 18/10/2016 09:38, Chris Wilson wrote:
> On Tue, Oct 18, 2016 at 09:22:56AM +0100, Tvrtko Ursulin wrote:
>> On 18/10/2016 09:17, Chris Wilson wrote:
>>> On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
>>>> On 17/10/2016 15:10, Chris Wilson wrote:
>>>>> When handling execbuf relocations, we play a delicate dance with
>>>>> pagefault. We first try to access the user pages underneath our
>>>>> struct_mutex. However, if those pages were inside a GEM object, we may
>>>>> trigger a pagefault and deadlock as i915_gem_fault() tries to
>>>>> recursively acquire struct_mutex. Instead, we choose to disable
>>>>> pagefaulting around the copy_from_user whilst inside the struct_mutex
>>>>> and handle the EFAULT by falling back to a copy outside the
>>>>> struct_mutex.
>>>>>
>>>>> We however presumed that disabling pagefaults would be expensive. It is
>>>>> just an operation on the local current task. Cheap enough that we can
>>>>> restrict the disable/enable to the critical section around the copy, and
>>>>> so avoid having to handle the atomic sections within the relocation
>>>>> handling itself.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
>>>>> 1 file changed, 28 insertions(+), 36 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> index 1d02e74ce62d..22342ad0e07f 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> @@ -551,20 +551,6 @@ repeat:
>>>>> return 0;
>>>>> }
>>>>> -static bool object_is_idle(struct drm_i915_gem_object *obj)
>>>>> -{
>>>>> - unsigned long active = i915_gem_object_get_active(obj);
>>>>> - int idx;
>>>>> -
>>>>> - for_each_active(active, idx) {
>>>>> - if (!i915_gem_active_is_idle(&obj->last_read[idx],
>>>>> - &obj->base.dev->struct_mutex))
>>>>> - return false;
>>>>> - }
>>>>> -
>>>>> - return true;
>>>>> -}
>>>>> -
>>>>> static int
>>>>> i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>>> struct eb_vmas *eb,
>>>>> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>>> return -EINVAL;
>>>>> }
>>>>> - /* We can't wait for rendering with pagefaults disabled */
>>>>> - if (pagefault_disabled() && !object_is_idle(obj))
>>>>> - return -EFAULT;
>>>>> -
>>>>> ret = relocate_entry(obj, reloc, cache, target_offset);
>>>>> if (ret)
>>>>> return ret;
>>>>> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>>> remain = entry->relocation_count;
>>>>> while (remain) {
>>>>> struct drm_i915_gem_relocation_entry *r = stack_reloc;
>>>>> - int count = remain;
>>>>> - if (count > ARRAY_SIZE(stack_reloc))
>>>>> - count = ARRAY_SIZE(stack_reloc);
>>>>> + unsigned long unwritten;
>>>>> + unsigned int count;
>>>>> +
>>>>> + count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>>>>> remain -= count;
>>>>> - if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
>>>>> + /* This is the fast path and we cannot handle a pagefault
>>>>> + * whilst holding the struct mutex lest the user pass in the
>>>>> + * relocations contained within a mmaped bo. For in such a case
>>>>> + * we, the page fault handler would call i915_gem_fault() and
>>>>> + * we would try to acquire the struct mutex again. Obviously
>>>>> + * this is bad and so lockdep complains vehemently.
>>>>> + */
>>>>> + pagefault_disable();
>>>>> + unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
>>>>> + pagefault_enable();
>>>>> + if (unwritten) {
>>>>> ret = -EFAULT;
>>>>> goto out;
>>>>> }
>>>>> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>>> if (ret)
>>>>> goto out;
>>>>> - if (r->presumed_offset != offset &&
>>>>> - __put_user(r->presumed_offset,
>>>>> - &user_relocs->presumed_offset)) {
>>>>> - ret = -EFAULT;
>>>>> - goto out;
>>>>> + if (r->presumed_offset != offset) {
>>>>> + /* Copying back to the user is allowed to fail.
>>>>> + * The information passed back is a hint as
>>>>> + * to the final location. If the copy_to_user
>>>>> + * fails after a successful copy_from_user,
>>>>> + * it must be a readonly location and so
>>>>> + * we presume the user knows what they are
>>>>> + * doing!
>>>>> + */
>>>>> + pagefault_disable();
>>>>> + __put_user(r->presumed_offset,
>>>>> + &user_relocs->presumed_offset);
>>>>> + pagefault_enable();
>>>> Why is a good idea to ignore potential errors here?
>>> Wrong question: why did we think it a good idea to ignore success here?
>> How were we ignoring errors, I don't follow?
>>
>>> (a) it is safe to do so, and I can legitimately setup userspace to use
>>> this
>> How what where? :)
> igt, whereelse, has a test case to check we can execute from read only
> memory.
With relocations?
>
>>> (b) reporting an error after we have committed the change is broken
>>> anyway.
>> You mean in the half way through cases?
> We've already made the user/gpu visible change. Failing here is broken.
> (User doesn't notice the change, presumed_offset doesn't match actual,
> kernel is oblivious to mismatch, GPU hangs.)
How is that? I see a error path propagating all the way up to failing
the execbuf.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list