[Intel-gfx] [PATCH v4] drm/i915: Clean up associated VMAs on context destruction
Daniel Vetter
daniel at ffwll.ch
Tue Oct 6 01:58:01 PDT 2015
On Mon, Oct 05, 2015 at 01:26:36PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> Prevent leaking VMAs and PPGTT VMs when objects are imported
> via flink.
>
> Scenario is that any VMAs created by the importer will be left
> dangling after the importer exits, or destroys the PPGTT context
> with which they are associated.
>
> This is caused by object destruction not running when the
> importer closes the buffer object handle due the reference held
> by the exporter. This also leaks the VM since the VMA has a
> reference on it.
>
> In practice these leaks can be observed by stopping and starting
> the X server on a kernel with fbcon compiled in. Every time
> X server exits another VMA will be leaked against the fbcon's
> frame buffer object.
>
> Also on systems where flink buffer sharing is used extensively,
> like Android, this leak has even more serious consequences.
>
> This version is takes a general approach from the earlier work
> by Rafael Barbalho (drm/i915: Clean-up PPGTT on context
> destruction) and tries to incorporate the subsequent discussion
> between Chris Wilson and Daniel Vetter.
>
> v2:
>
> Removed immediate cleanup on object retire - it was causing a
> recursive VMA unbind via i915_gem_object_wait_rendering. And
> it is in fact not even needed since by definition context
> cleanup worker runs only after the last context reference has
> been dropped, hence all VMAs against the VM belonging to the
> context are already on the inactive list.
>
> v3:
>
> Previous version could deadlock since VMA unbind waits on any
> rendering on an object to complete. Objects can be busy in a
> different VM which would mean that the cleanup loop would do
> the wait with the struct mutex held.
>
> This is an even simpler approach where we just unbind VMAs
> without waiting since we know all VMAs belonging to this VM
> are idle, and there is nothing in flight, at the point
> context destructor runs.
>
> v4:
>
> Double underscore prefix for __915_vma_unbind_no_wait and a
> commit message typo fix. (Michel Thierry)
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Testcase: igt/gem_ppgtt.c/flink-and-exit-vma-leak
> Reviewed-by: Michel Thierry <michel.thierry at intel.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Rafael Barbalho <rafael.barbalho at intel.com>
> Cc: Michel Thierry <michel.thierry at intel.com>
Queued for -next, thanks for the patch.
-Daniel
> ---
> drivers/gpu/drm/i915/i915_drv.h | 5 +++++
> drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++++++++----
> drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++++++
> 3 files changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 824e7245044d..58afa1bb2957 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2840,6 +2840,11 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
> int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
> u32 flags);
> int __must_check i915_vma_unbind(struct i915_vma *vma);
> +/*
> + * BEWARE: Do not use the function below unless you can _absolutely_
> + * _guarantee_ VMA in question is _not in use_ anywhere.
> + */
> +int __must_check __i915_vma_unbind_no_wait(struct i915_vma *vma);
> int i915_gem_object_put_pages(struct drm_i915_gem_object *obj);
> void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
> void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f0cfbb9ee12c..52642aff1dab 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3208,7 +3208,7 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
> old_write_domain);
> }
>
> -int i915_vma_unbind(struct i915_vma *vma)
> +static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
> {
> struct drm_i915_gem_object *obj = vma->obj;
> struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
> @@ -3227,9 +3227,11 @@ int i915_vma_unbind(struct i915_vma *vma)
>
> BUG_ON(obj->pages == NULL);
>
> - ret = i915_gem_object_wait_rendering(obj, false);
> - if (ret)
> - return ret;
> + if (wait) {
> + ret = i915_gem_object_wait_rendering(obj, false);
> + if (ret)
> + return ret;
> + }
>
> if (i915_is_ggtt(vma->vm) &&
> vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
> @@ -3274,6 +3276,16 @@ int i915_vma_unbind(struct i915_vma *vma)
> return 0;
> }
>
> +int i915_vma_unbind(struct i915_vma *vma)
> +{
> + return __i915_vma_unbind(vma, true);
> +}
> +
> +int __i915_vma_unbind_no_wait(struct i915_vma *vma)
> +{
> + return __i915_vma_unbind(vma, false);
> +}
> +
> int i915_gpu_idle(struct drm_device *dev)
> {
> struct drm_i915_private *dev_priv = dev->dev_private;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 74aa0c9929ba..680b4c9f6b73 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -133,6 +133,23 @@ static int get_context_size(struct drm_device *dev)
> return ret;
> }
>
> +static void i915_gem_context_clean(struct intel_context *ctx)
> +{
> + struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
> + struct i915_vma *vma, *next;
> +
> + if (WARN_ON_ONCE(!ppgtt))
> + return;
> +
> + WARN_ON(!list_empty(&ppgtt->base.active_list));
> +
> + list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
> + mm_list) {
> + if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
> + break;
> + }
> +}
> +
> void i915_gem_context_free(struct kref *ctx_ref)
> {
> struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
> @@ -142,6 +159,13 @@ void i915_gem_context_free(struct kref *ctx_ref)
> if (i915.enable_execlists)
> intel_lr_context_free(ctx);
>
> + /*
> + * This context is going away and we need to remove all VMAs still
> + * around. This is to handle imported shared objects for which
> + * destructor did not run when their handles were closed.
> + */
> + i915_gem_context_clean(ctx);
> +
> i915_ppgtt_put(ctx->ppgtt);
>
> if (ctx->legacy_hw_ctx.rcs_state)
> --
> 1.9.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list