[Intel-gfx] [PATCH 2/2] drm/i915: Prevent recursive deadlock on releasing a busy userptr

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Jul 29 11:39:55 CEST 2014


On 07/23/2014 06:15 PM, Chris Wilson wrote:
> On Wed, Jul 23, 2014 at 05:39:49PM +0100, Tvrtko Ursulin wrote:
>> On 07/21/2014 01:21 PM, Chris Wilson wrote:
>>> +	mn = i915_mmu_notifier_get(obj->userptr.mm);
>>> +	if (IS_ERR(mn))
>>> +		return PTR_ERR(mn);
>>
>> Very minor, but I would perhaps consider renaming this to _find
>> since _get in my mind strongly associates with reference counting
>> and this does not do that. Especially if the reviewer looks at the
>> bail out below and sees no matching put. But minor as I said, you
>> can judge what you prefer.
>
> The same. It was _get because it did used to a reference counter, now
> that counting has been removed from the i915_mmu_notifier.
>
>>> +static int
>>> +i915_gem_userptr_init__mm_struct(struct drm_i915_gem_object *obj)
>>> +{
>>> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>>> +	struct i915_mm_struct *mm;
>>> +	struct mm_struct *real;
>>> +	int ret = 0;
>>> +
>>> +	real = get_task_mm(current);
>>> +	if (real == NULL)
>>> +		return -EINVAL;
>>
>> Do you think we need get_task_mm()/mmput() here, given it is all
>> inside a single system call?
>
> No. I kept using get_task_mm() simply because it looked neater than
> current->mm, but current->mm looks like it gives simpler code.
>
>>> +	/* During release of the GEM object we hold the struct_mutex. As the
>>> +	 * object may be holding onto the last reference for the task->mm,
>>> +	 * calling mmput() may trigger exit_mmap() which close the vma
>>> +	 * which will call drm_gem_vm_close() and attempt to reacquire
>>> +	 * the struct_mutex. In order to avoid that recursion, we have
>>> +	 * to defer the mmput() until after we drop the struct_mutex,
>>> +	 * i.e. we need to schedule a worker to do the clean up.
>>> +	 */
>>
>> This comment reads like a strange mixture and past and present eg.
>> what used to be the case and what is the fix. We don't hold a
>> reference to the process mm as the address space (terminology OK?).
>> We do hold a reference to the mm struct itself - which is enough to
>> unregister the notifiers, correct?
>
> True.  I was more or less trying to explain the bug and that comment
> ended up being the changelog entry. It doesn't work well as a comment.
>
> +       /* During release of the GEM object we hold the struct_mutex. This
> +        * precludes us from calling mmput() at that time as that may be
> +        * the last reference and so call exit_mmap(). exit_mmap() will
> +        * attempt to reap the vma, and if we were holding a GTT mmap
> +        * would then call drm_gem_vm_close() and attempt to reacquire
> +        * the struct mutex. So in order to avoid that recursion, we have
> +        * to defer releasing the mm reference until after we drop the
> +        * struct_mutex, i.e. we need to schedule a worker to do the clean
> +        * up.

Sounds good, just saying really to remind you to post a respin. :)

Regards,

Tvrtko



More information about the Intel-gfx mailing list