[Intel-gfx] [PATCH v5 1/2] drm/i915: Fix failure paths around initial fbdev allocation

Lukas Wunner lukas at wunner.de
Sun Oct 18 11:03:23 PDT 2015


Hi Ville,

On Thu, Oct 15, 2015 at 08:34:23PM +0300, Ville Syrjälä wrote:
> On Thu, Oct 15, 2015 at 07:14:35PM +0200, Lukas Wunner wrote:
> > Hi Ville,
> > 
> > On Tue, Oct 13, 2015 at 06:04:40PM +0300, Ville Syrjälä wrote:
> > > On Tue, Jun 30, 2015 at 10:06:27AM +0100, Lukas Wunner wrote:
> > > > From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > 
> > > > We had two failure modes here:
> > > > 
> > > > 1.
> > > > Deadlock in intelfb_alloc failure path where it calls
> > > > drm_framebuffer_remove, which grabs the struct mutex and intelfb_create
> > > > (caller of intelfb_alloc) was already holding it.
> > > > 
> > > > 2.
> > > > Deadlock in intelfb_create failure path where it calls
> > > > drm_framebuffer_unreference, which grabs the struct mutex and
> > > > intelfb_create was already holding it.
> > > > 
> > > > v2:
> > > >    * Reformat commit msg to 72 chars. (Lukas Wunner)
> > > >    * Add third failure mode. (Lukas Wunner)
> > > > 
> > > > v3:
> > > >    * On fb alloc failure, unref gem object where it gets refed,
> > > >      fix double unref in separate commit. (Ville Syrjälä)
> > > > 
> > > > v4:
> > > >    * Lock struct mutex on unref. (Chris Wilson)
> > > > 
> > > > v5:
> > > >    * Rebase on drm-intel-nightly 2015y-09m-04d-08h-19m-35s UTC,
> > > >      rephrase commit message. (Jani Nicula)
> > > > 
> > > > Tested-by: Pierre Moreau <pierre.morrow at free.fr>
> > > >     [MBP  5,3 2009  nvidia 9400M + 9600M GT   pre-retina]
> > > > Tested-by: Paul Hordiienko <pvt.gord at gmail.com>
> > > >     [MBP  6,2 2010  intel ILK + nvidia GT216  pre-retina]
> > > > Tested-by: William Brown <william at blackhats.net.au>
> > > >     [MBP  8,2 2011  intel SNB + amd turks     pre-retina]
> > > > Tested-by: Lukas Wunner <lukas at wunner.de>
> > > >     [MBP  9,1 2012  intel IVB + nvidia GK107  pre-retina]
> > > > Tested-by: Bruno Bierbaumer <bruno at bierbaumer.net>
> > > >     [MBP 11,3 2013  intel HSW + nvidia GK107  retina]
> > > > 
> > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > Fixes: 60a5ca015ffd ("drm/i915: Add locking around
> > > >     framebuffer_references--")
> > > > Reported-by: Lukas Wunner <lukas at wunner.de>
> > > > [Lukas: Create v3 + v4 + v5 based on Tvrtko's v2]
> > > > Signed-off-by: Lukas Wunner <lukas at wunner.de>
> > > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > Cc: Jani Nikula <jani.nikula at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_fbdev.c | 20 ++++++++++++--------
> > > >  1 file changed, 12 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> > > > index 96476d7..eee3306 100644
> > > > --- a/drivers/gpu/drm/i915/intel_fbdev.c
> > > > +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> > > > @@ -119,7 +119,7 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
> > > >  {
> > > >  	struct intel_fbdev *ifbdev =
> > > >  		container_of(helper, struct intel_fbdev, helper);
> > > > -	struct drm_framebuffer *fb;
> > > > +	struct drm_framebuffer *fb = NULL;
> > > >  	struct drm_device *dev = helper->dev;
> > > >  	struct drm_mode_fb_cmd2 mode_cmd = {};
> > > >  	struct drm_i915_gem_object *obj;
> > > > @@ -137,6 +137,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
> > > >  	mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
> > > >  							  sizes->surface_depth);
> > > >  
> > > > +	mutex_lock(&dev->struct_mutex);
> > > > +
> > > >  	size = mode_cmd.pitches[0] * mode_cmd.height;
> > > >  	size = PAGE_ALIGN(size);
> > > >  	obj = i915_gem_object_create_stolen(dev, size);
> > > > @@ -158,18 +160,21 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
> > > >  	ret = intel_pin_and_fence_fb_obj(NULL, fb, NULL, NULL, NULL);
> > > >  	if (ret) {
> > > >  		DRM_ERROR("failed to pin obj: %d\n", ret);
> > > > -		goto out_fb;
> > > > +		goto out_unref;
> > > >  	}
> > > >  
> > > > +	mutex_unlock(&dev->struct_mutex);
> > > > +
> > > >  	ifbdev->fb = to_intel_framebuffer(fb);
> > > >  
> > > >  	return 0;
> > > >  
> > > > -out_fb:
> > > > -	drm_framebuffer_remove(fb);
> > > >  out_unref:
> > > >  	drm_gem_object_unreference(&obj->base);
> > > 
> > > If fb init succeeded it took over the ref, no? So drm_framebuffer_remove()
> > > will now attempt to unref one too many times.
> > > 
> > > This taking over refs stuff is confusing. Maybe it would be better if
> > > everyone just took an extra ref when they stash the obj pointer
> > > somewhere, and everyone would then always release whatever ref they own
> > > and no longer need.
> > > 
> > > >  out:
> > > > +	mutex_unlock(&dev->struct_mutex);
> > > > +	if (fb)
> > > > +		drm_framebuffer_remove(fb);
> > > >  	return ret;
> > > >  }
> > > >  
> > 
> > Hm, why do you think we unref one too many times?
> 
> Because the fb now owns the reference, so it gets unreffed by the fb
> .destroy() hook... I think.

You're right. drm_framebuffer_remove() calls drm_framebuffer_unreference(),
if this was the last ref then drm_framebuffer_free() gets called,
which invokes the ->destroy callback intel_user_framebuffer_destroy(),
which in turn calls drm_gem_object_unreference().

So indeed it gets unrefed twice here.


> > 
> > A bit further up in this function we call __intel_framebuffer_create()
> > which sets the refcount to 1. (It calls intel_framebuffer_init(), which
> > calls drm_framebuffer_init(), which calls kref_init(&fb->refcount).)
> > 
> > So if intel_pin_and_fence_fb_obj() fails, we do need to unreference and
> > tear down the fb. Thus, drm_framebuffer_remove() seems right here to me.
> 
> I wasn't complaining about the fb unref, but the bo unref.
> 
> > 
> > However, because of your objection I've noticed now that "if (fb)" seems
> > to be wrong, I think this should be "if (!IS_ERR_OR_NULL(fb))".
> > 
> > Because if __intel_framebuffer_create() failed, fb will be a PTR_ERR(),
> > so not null, and we'd call drm_framebuffer_remove() on this. Is that
> > what you meant?
> 
> No, but that's a good observation too.
> 
> > 
> > 
> > > > @@ -187,8 +192,6 @@ static int intelfb_create(struct drm_fb_helper *helper,
> > > >  	int size, ret;
> > > >  	bool prealloc = false;
> > > >  
> > > > -	mutex_lock(&dev->struct_mutex);
> > > > -
> > > >  	if (intel_fb &&
> > > >  	    (sizes->fb_width > intel_fb->base.width ||
> > > >  	     sizes->fb_height > intel_fb->base.height)) {
> > > > @@ -203,7 +206,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
> > > >  		DRM_DEBUG_KMS("no BIOS fb, allocating a new one\n");
> > > >  		ret = intelfb_alloc(helper, sizes);
> > > >  		if (ret)
> > > > -			goto out_unlock;
> > > > +			return ret;
> > > >  		intel_fb = ifbdev->fb;
> > > >  	} else {
> > > >  		DRM_DEBUG_KMS("re-using BIOS fb\n");
> > > > @@ -215,6 +218,8 @@ static int intelfb_create(struct drm_fb_helper *helper,
> > > >  	obj = intel_fb->obj;
> > > >  	size = obj->base.size;
> > > >  
> > > > +	mutex_lock(&dev->struct_mutex);
> > > > +
> > > 
> > > I'm thinking we won't even need the lock here anymore. But maybe I'm
> > > missing something.
> > > 
> > > >  	info = drm_fb_helper_alloc_fbi(helper);
> > > >  	if (IS_ERR(info)) {
> > > >  		ret = PTR_ERR(info);
> > > > @@ -276,7 +281,6 @@ out_destroy_fbi:
> > > >  out_unpin:
> > > >  	i915_gem_object_ggtt_unpin(obj);
> > > >  	drm_gem_object_unreference(&obj->base);
> > > 
> > > And this ref we don't own either AFAICS.
> > 
> > Why? We did call intelfb_alloc() above, so if something subsequently
> > goes wrong, we need to revert the steps that intelfb_alloc() carried
> > out. The drm_gem_object_unreference() therefore seems right here to me.
> 
> Here too the fb (if succesfully created) now owns that reference.

But here the ->destroy callback isn't invoked because the fb isn't unrefed.
There's no call to drm_framebuffer_unreference() / drm_framebuffer_remove(),
so the gem object isn't unrefed twice but we seem to leak the fb.

That's exactly what I find confusing here, if intelfb_alloc() above was
successful and something else subsequently goes awry, we need to unref
the fb, right? Or am I missing something?

In the case when we've inherited the fb from BIOS (instead of creating it
with intelfb_alloc()), is it okay to unref the fb as well?


Best regards,

Lukas


> > However I'm puzzled why we don't call drm_framebuffer_remove() under
> > the out_unpin: label. Aren't we leaking a framebuffer here without that?
> > 
> > Maybe you're referring to the fact that this function either inherits
> > the BIOS fb or creates a new fb with intelfb_alloc(). I'm not sure if
> > the cleanup on error is identical in these two cases. Maybe you meant
> > that we don't own the ref in the case that the fb was inherited from
> > BIOS?
> > 
> > 
> > Best regards,
> > 
> > Lukas
> > 
> > > 
> > > > -out_unlock:
> > > >  	mutex_unlock(&dev->struct_mutex);
> > > >  	return ret;
> > > >  }
> > > > -- 
> > > > 2.1.0
> > > 
> > > -- 
> > > Ville Syrjälä
> > > Intel OTC
> 
> -- 
> Ville Syrjälä
> Intel OTC


More information about the Intel-gfx mailing list