[Intel-gfx] [PATCH v5 1/2] drm/i915: Fix failure paths around initial fbdev allocation

Lukas Wunner lukas at wunner.de
Thu Oct 15 10:14:35 PDT 2015


Hi Ville,

On Tue, Oct 13, 2015 at 06:04:40PM +0300, Ville Syrjälä wrote:
> On Tue, Jun 30, 2015 at 10:06:27AM +0100, Lukas Wunner wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > 
> > We had two failure modes here:
> > 
> > 1.
> > Deadlock in intelfb_alloc failure path where it calls
> > drm_framebuffer_remove, which grabs the struct mutex and intelfb_create
> > (caller of intelfb_alloc) was already holding it.
> > 
> > 2.
> > Deadlock in intelfb_create failure path where it calls
> > drm_framebuffer_unreference, which grabs the struct mutex and
> > intelfb_create was already holding it.
> > 
> > v2:
> >    * Reformat commit msg to 72 chars. (Lukas Wunner)
> >    * Add third failure mode. (Lukas Wunner)
> > 
> > v3:
> >    * On fb alloc failure, unref gem object where it gets refed,
> >      fix double unref in separate commit. (Ville Syrjälä)
> > 
> > v4:
> >    * Lock struct mutex on unref. (Chris Wilson)
> > 
> > v5:
> >    * Rebase on drm-intel-nightly 2015y-09m-04d-08h-19m-35s UTC,
> >      rephrase commit message. (Jani Nicula)
> > 
> > Tested-by: Pierre Moreau <pierre.morrow at free.fr>
> >     [MBP  5,3 2009  nvidia 9400M + 9600M GT   pre-retina]
> > Tested-by: Paul Hordiienko <pvt.gord at gmail.com>
> >     [MBP  6,2 2010  intel ILK + nvidia GT216  pre-retina]
> > Tested-by: William Brown <william at blackhats.net.au>
> >     [MBP  8,2 2011  intel SNB + amd turks     pre-retina]
> > Tested-by: Lukas Wunner <lukas at wunner.de>
> >     [MBP  9,1 2012  intel IVB + nvidia GK107  pre-retina]
> > Tested-by: Bruno Bierbaumer <bruno at bierbaumer.net>
> >     [MBP 11,3 2013  intel HSW + nvidia GK107  retina]
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Fixes: 60a5ca015ffd ("drm/i915: Add locking around
> >     framebuffer_references--")
> > Reported-by: Lukas Wunner <lukas at wunner.de>
> > [Lukas: Create v3 + v4 + v5 based on Tvrtko's v2]
> > Signed-off-by: Lukas Wunner <lukas at wunner.de>
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > Cc: Jani Nikula <jani.nikula at intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_fbdev.c | 20 ++++++++++++--------
> >  1 file changed, 12 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> > index 96476d7..eee3306 100644
> > --- a/drivers/gpu/drm/i915/intel_fbdev.c
> > +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> > @@ -119,7 +119,7 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
> >  {
> >  	struct intel_fbdev *ifbdev =
> >  		container_of(helper, struct intel_fbdev, helper);
> > -	struct drm_framebuffer *fb;
> > +	struct drm_framebuffer *fb = NULL;
> >  	struct drm_device *dev = helper->dev;
> >  	struct drm_mode_fb_cmd2 mode_cmd = {};
> >  	struct drm_i915_gem_object *obj;
> > @@ -137,6 +137,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
> >  	mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
> >  							  sizes->surface_depth);
> >  
> > +	mutex_lock(&dev->struct_mutex);
> > +
> >  	size = mode_cmd.pitches[0] * mode_cmd.height;
> >  	size = PAGE_ALIGN(size);
> >  	obj = i915_gem_object_create_stolen(dev, size);
> > @@ -158,18 +160,21 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
> >  	ret = intel_pin_and_fence_fb_obj(NULL, fb, NULL, NULL, NULL);
> >  	if (ret) {
> >  		DRM_ERROR("failed to pin obj: %d\n", ret);
> > -		goto out_fb;
> > +		goto out_unref;
> >  	}
> >  
> > +	mutex_unlock(&dev->struct_mutex);
> > +
> >  	ifbdev->fb = to_intel_framebuffer(fb);
> >  
> >  	return 0;
> >  
> > -out_fb:
> > -	drm_framebuffer_remove(fb);
> >  out_unref:
> >  	drm_gem_object_unreference(&obj->base);
> 
> If fb init succeeded it took over the ref, no? So drm_framebuffer_remove()
> will now attempt to unref one too many times.
> 
> This taking over refs stuff is confusing. Maybe it would be better if
> everyone just took an extra ref when they stash the obj pointer
> somewhere, and everyone would then always release whatever ref they own
> and no longer need.
> 
> >  out:
> > +	mutex_unlock(&dev->struct_mutex);
> > +	if (fb)
> > +		drm_framebuffer_remove(fb);
> >  	return ret;
> >  }
> >  

Hm, why do you think we unref one too many times?

A bit further up in this function we call __intel_framebuffer_create()
which sets the refcount to 1. (It calls intel_framebuffer_init(), which
calls drm_framebuffer_init(), which calls kref_init(&fb->refcount).)

So if intel_pin_and_fence_fb_obj() fails, we do need to unreference and
tear down the fb. Thus, drm_framebuffer_remove() seems right here to me.

However, because of your objection I've noticed now that "if (fb)" seems
to be wrong, I think this should be "if (!IS_ERR_OR_NULL(fb))".

Because if __intel_framebuffer_create() failed, fb will be a PTR_ERR(),
so not null, and we'd call drm_framebuffer_remove() on this. Is that
what you meant?


> > @@ -187,8 +192,6 @@ static int intelfb_create(struct drm_fb_helper *helper,
> >  	int size, ret;
> >  	bool prealloc = false;
> >  
> > -	mutex_lock(&dev->struct_mutex);
> > -
> >  	if (intel_fb &&
> >  	    (sizes->fb_width > intel_fb->base.width ||
> >  	     sizes->fb_height > intel_fb->base.height)) {
> > @@ -203,7 +206,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
> >  		DRM_DEBUG_KMS("no BIOS fb, allocating a new one\n");
> >  		ret = intelfb_alloc(helper, sizes);
> >  		if (ret)
> > -			goto out_unlock;
> > +			return ret;
> >  		intel_fb = ifbdev->fb;
> >  	} else {
> >  		DRM_DEBUG_KMS("re-using BIOS fb\n");
> > @@ -215,6 +218,8 @@ static int intelfb_create(struct drm_fb_helper *helper,
> >  	obj = intel_fb->obj;
> >  	size = obj->base.size;
> >  
> > +	mutex_lock(&dev->struct_mutex);
> > +
> 
> I'm thinking we won't even need the lock here anymore. But maybe I'm
> missing something.
> 
> >  	info = drm_fb_helper_alloc_fbi(helper);
> >  	if (IS_ERR(info)) {
> >  		ret = PTR_ERR(info);
> > @@ -276,7 +281,6 @@ out_destroy_fbi:
> >  out_unpin:
> >  	i915_gem_object_ggtt_unpin(obj);
> >  	drm_gem_object_unreference(&obj->base);
> 
> And this ref we don't own either AFAICS.

Why? We did call intelfb_alloc() above, so if something subsequently
goes wrong, we need to revert the steps that intelfb_alloc() carried
out. The drm_gem_object_unreference() therefore seems right here to me.

However I'm puzzled why we don't call drm_framebuffer_remove() under
the out_unpin: label. Aren't we leaking a framebuffer here without that?

Maybe you're referring to the fact that this function either inherits
the BIOS fb or creates a new fb with intelfb_alloc(). I'm not sure if
the cleanup on error is identical in these two cases. Maybe you meant
that we don't own the ref in the case that the fb was inherited from
BIOS?


Best regards,

Lukas

> 
> > -out_unlock:
> >  	mutex_unlock(&dev->struct_mutex);
> >  	return ret;
> >  }
> > -- 
> > 2.1.0
> 
> -- 
> Ville Syrjälä
> Intel OTC


More information about the Intel-gfx mailing list