[Intel-gfx] [PATCH v4] drm/i915: Clean up associated VMAs on context destruction

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Thu Oct 8 09:26:44 PDT 2015


On 08/10/15 17:03, Ville Syrjälä wrote:
> On Thu, Oct 08, 2015 at 03:03:11PM +0100, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> On 08/10/15 14:45, Ville Syrjälä wrote:
>>> On Mon, Oct 05, 2015 at 01:26:36PM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> Prevent leaking VMAs and PPGTT VMs when objects are imported
>>>> via flink.
>>>>
>>>> Scenario is that any VMAs created by the importer will be left
>>>> dangling after the importer exits, or destroys the PPGTT context
>>>> with which they are associated.
>>>>
>>>> This is caused by object destruction not running when the
>>>> importer closes the buffer object handle due the reference held
>>>> by the exporter. This also leaks the VM since the VMA has a
>>>> reference on it.
>>>>
>>>> In practice these leaks can be observed by stopping and starting
>>>> the X server on a kernel with fbcon compiled in. Every time
>>>> X server exits another VMA will be leaked against the fbcon's
>>>> frame buffer object.
>>>>
>>>> Also on systems where flink buffer sharing is used extensively,
>>>> like Android, this leak has even more serious consequences.
>>>>
>>>> This version is takes a general approach from the  earlier work
>>>> by Rafael Barbalho (drm/i915: Clean-up PPGTT on context
>>>> destruction) and tries to incorporate the subsequent discussion
>>>> between Chris Wilson and Daniel Vetter.
>>>>
>>>> v2:
>>>>
>>>> Removed immediate cleanup on object retire - it was causing a
>>>> recursive VMA unbind via i915_gem_object_wait_rendering. And
>>>> it is in fact not even needed since by definition context
>>>> cleanup worker runs only after the last context reference has
>>>> been dropped, hence all VMAs against the VM belonging to the
>>>> context are already on the inactive list.
>>>>
>>>> v3:
>>>>
>>>> Previous version could deadlock since VMA unbind waits on any
>>>> rendering on an object to complete. Objects can be busy in a
>>>> different VM which would mean that the cleanup loop would do
>>>> the wait with the struct mutex held.
>>>>
>>>> This is an even simpler approach where we just unbind VMAs
>>>> without waiting since we know all VMAs belonging to this VM
>>>> are idle, and there is nothing in flight, at the point
>>>> context destructor runs.
>>>>
>>>> v4:
>>>>
>>>> Double underscore prefix for __915_vma_unbind_no_wait and a
>>>> commit message typo fix. (Michel Thierry)
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> Testcase: igt/gem_ppgtt.c/flink-and-exit-vma-leak
>>>> Reviewed-by: Michel Thierry <michel.thierry at intel.com>
>>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>> Cc: Rafael Barbalho <rafael.barbalho at intel.com>
>>>> Cc: Michel Thierry <michel.thierry at intel.com>
>>>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_drv.h         |  5 +++++
>>>>    drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++++++++----
>>>>    drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++++++
>>>>    3 files changed, 45 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>> index 824e7245044d..58afa1bb2957 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -2840,6 +2840,11 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
>>>>    int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
>>>>    		  u32 flags);
>>>>    int __must_check i915_vma_unbind(struct i915_vma *vma);
>>>> +/*
>>>> + * BEWARE: Do not use the function below unless you can _absolutely_
>>>> + * _guarantee_ VMA in question is _not in use_ anywhere.
>>>> + */
>>>> +int __must_check __i915_vma_unbind_no_wait(struct i915_vma *vma);
>>>>    int i915_gem_object_put_pages(struct drm_i915_gem_object *obj);
>>>>    void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
>>>>    void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>> index f0cfbb9ee12c..52642aff1dab 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -3208,7 +3208,7 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
>>>>    					    old_write_domain);
>>>>    }
>>>>
>>>> -int i915_vma_unbind(struct i915_vma *vma)
>>>> +static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
>>>>    {
>>>>    	struct drm_i915_gem_object *obj = vma->obj;
>>>>    	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
>>>> @@ -3227,9 +3227,11 @@ int i915_vma_unbind(struct i915_vma *vma)
>>>>
>>>>    	BUG_ON(obj->pages == NULL);
>>>>
>>>> -	ret = i915_gem_object_wait_rendering(obj, false);
>>>> -	if (ret)
>>>> -		return ret;
>>>> +	if (wait) {
>>>> +		ret = i915_gem_object_wait_rendering(obj, false);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +	}
>>>>
>>>>    	if (i915_is_ggtt(vma->vm) &&
>>>>    	    vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
>>>> @@ -3274,6 +3276,16 @@ int i915_vma_unbind(struct i915_vma *vma)
>>>>    	return 0;
>>>>    }
>>>>
>>>> +int i915_vma_unbind(struct i915_vma *vma)
>>>> +{
>>>> +	return __i915_vma_unbind(vma, true);
>>>> +}
>>>> +
>>>> +int __i915_vma_unbind_no_wait(struct i915_vma *vma)
>>>> +{
>>>> +	return __i915_vma_unbind(vma, false);
>>>> +}
>>>> +
>>>>    int i915_gpu_idle(struct drm_device *dev)
>>>>    {
>>>>    	struct drm_i915_private *dev_priv = dev->dev_private;
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> index 74aa0c9929ba..680b4c9f6b73 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> @@ -133,6 +133,23 @@ static int get_context_size(struct drm_device *dev)
>>>>    	return ret;
>>>>    }
>>>>
>>>> +static void i915_gem_context_clean(struct intel_context *ctx)
>>>> +{
>>>> +	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
>>>> +	struct i915_vma *vma, *next;
>>>> +
>>>> +	if (WARN_ON_ONCE(!ppgtt))
>>>> +		return;
>>>
>>> [   80.242892] [drm:i915_gem_open]
>>> [   80.250485] ------------[ cut here ]------------
>>> [   80.258938] WARNING: CPU: 1 PID: 277 at ../drivers/gpu/drm/i915/i915_gem_context.c:141 i915_gem_context_free+0xef/0x27b [i915]()
>>> [   80.275344] WARN_ON_ONCE(!ppgtt)
>>
>> Ha, I did not see how ctx->ppgtt == NULL could exist but wanted to be
>> extra safe since i915_ppgtt_put checks for that.
>
> Well, how can it exit?

What? :)

Regards,

Tvrtko





More information about the Intel-gfx mailing list