[Intel-gfx] [PATCH] drm/i915: Do not leak VMAs (and PPGTT VMs) of imported flinked objects

Chris Wilson chris at chris-wilson.co.uk
Mon Apr 20 05:36:12 PDT 2015


On Mon, Apr 20, 2015 at 01:14:34PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> If a client instantiates a VMA against an imported object (flink) this VMA
> will not get unbound when the object is closed.
> 
> This happens because the exporter holds a reference count on the object and
> will also keep a reference to the PPGTT VM.
> 
> In real life this happens with xorg-driver-intel and fbcon takeover. Latter
> is copied from via the flink name and when Xorg process exists one VMA
> remains dangling with a now unreachable VM.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Testcase: igt/gem_ppgtt/flink-vs-ctx-vm-leak
> ---
>  drivers/gpu/drm/i915/i915_drv.c |  1 +
>  drivers/gpu/drm/i915/i915_drv.h |  2 ++
>  drivers/gpu/drm/i915/i915_gem.c | 63 ++++++++++++++++++++++++++++++++---------
>  3 files changed, 53 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f9754c3..16a0b34 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1630,6 +1630,7 @@ static struct drm_driver driver = {
>  	.debugfs_init = i915_debugfs_init,
>  	.debugfs_cleanup = i915_debugfs_cleanup,
>  #endif
> +	.gem_close_object = i915_gem_close_object,
>  	.gem_free_object = i915_gem_free_object,
>  	.gem_vm_ops = &i915_gem_vm_ops,
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 6a2528c..e82790b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2635,6 +2635,8 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>  						  size_t size);
>  void i915_init_vm(struct drm_i915_private *dev_priv,
>  		  struct i915_address_space *vm);
> +void i915_gem_close_object(struct drm_gem_object *gem_obj,
> +			   struct drm_file *file);
>  void i915_gem_free_object(struct drm_gem_object *obj);
>  void i915_gem_vma_destroy(struct i915_vma *vma);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f7b8766..a720154 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4523,6 +4523,55 @@ static bool discard_backing_storage(struct drm_i915_gem_object *obj)
>  	return atomic_long_read(&obj->base.filp->f_count) == 1;
>  }
>  
> +static void i915_gem_unbind_vma(struct drm_i915_private *dev_priv,
> +				struct i915_vma *vma)
> +{
> +	if (WARN_ON(i915_vma_unbind(vma) == -ERESTARTSYS)) {
> +		bool was_interruptible;
> +
> +		was_interruptible = dev_priv->mm.interruptible;
> +		dev_priv->mm.interruptible = false;
> +
> +		WARN_ON(i915_vma_unbind(vma));
> +
> +		dev_priv->mm.interruptible = was_interruptible;
> +	}
> +}
> +
> +void i915_gem_close_object(struct drm_gem_object *gem_obj,
> +			   struct drm_file *file)
> +{
> +	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
> +	struct drm_device *dev = obj->base.dev;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_vma *vma, *next;
> +	struct i915_hw_ppgtt *ppgtt;
> +
> +	mutex_lock(&dev->struct_mutex);
> +
> +	/*
> +	 * Release all VMAs associated with this client's PPGTT.
> +	 *
> +	 * This is to avoid potentially unreachable VMAs since contexts can have
> +	 * shorter lifetime than objects. Meaning if a client has a reference to
> +	 * an object (flink) and an instantiated VMA, when it exists neither VMA
> +	 * will be unbound (since object free won't run), nor the PPGTT VM
> +	 * freed (since VMA holds a reference to it).
> +	 */
> +	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
> +		if (i915_is_ggtt(vma->vm))
> +			continue;
> +
> +		ppgtt = (struct i915_hw_ppgtt *)vma->vm;
> +		if (ppgtt->file_priv != file_priv)
> +			continue;
> +
> +		i915_gem_unbind_vma(dev->dev_private, vma);

No we can't do this, as it makes close sync and so can have disasterous
effects on performance (though mitigated chiefly by userspace
agressively caching bo) and also the unbind is very likely to fail,
though admittedly the fbcon copy should be before X starts (ab)using
signals - hence nasty WARN_ON.

Plus also walking this linear list is quite painful in certain abusive
tests. My preference for fixing this bug would be via vma active
references and auto-unbinding on retirement after a close.

Expect some fun patches in your inbox Tvrtko :-p
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list