[Intel-gfx] [PATCH] drm/i915/guc: Fix a memory leak where guc->execbuf_client is not freed

Yu Dai yu.dai at intel.com
Tue Jan 12 14:57:28 PST 2016



On 01/12/2016 04:11 AM, Dave Gordon wrote:
> On 06/01/16 20:53, yu.dai at intel.com wrote:
> > From: Alex Dai <yu.dai at intel.com>
> >
> > During driver unloading, the guc_client created for command submission
> > needs to be released to avoid memory leak.
> >
> > Signed-off-by: Alex Dai <yu.dai at intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_guc_submission.c | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> > index 9c24424..8ce4f32 100644
> > --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> > @@ -995,6 +995,9 @@ void i915_guc_submission_fini(struct drm_device *dev)
> >   	struct drm_i915_private *dev_priv = dev->dev_private;
> >   	struct intel_guc *guc = &dev_priv->guc;
> >
> > +	if (i915.enable_guc_submission)
> > +		i915_guc_submission_disable(dev);
> > +
> >   	gem_release_guc_obj(dev_priv->guc.ads_obj);
> >   	guc->ads_obj = NULL;
>
> This looks like the right thing to do, but the wrong place to do it.
>
> i915_guc_submission_{init,enable,disable,fini} are the top-level
> functions exported from this source file and called (only) from
> intel_guc_loader.c
>
> Therefore, the code in intel_guc_ucode_fini() should call
> submission_disable() before submission_fini(), like this:
>
> /**
>    * intel_guc_ucode_fini() - clean up all allocated resources
>    * @dev:        drm device
>    */
> void intel_guc_ucode_fini(struct drm_device *dev)
> {
>           struct drm_i915_private *dev_priv = dev->dev_private;
>           struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
>
>           direct_interrupts_to_host(dev_priv);
> +	i915_guc_submission_disable(dev);
> 	i915_guc_submission_fini(dev);
>
>           mutex_lock(&dev->struct_mutex);
>           if (guc_fw->guc_fw_obj)
>                   drm_gem_object_unreference(&guc_fw->guc_fw_obj->base);
>           guc_fw->guc_fw_obj = NULL;
>           mutex_unlock(&dev->struct_mutex);
>
>           guc_fw->guc_fw_fetch_status = GUC_FIRMWARE_NONE;
> }
>
> There's no need for it to be conditional, as disable (and fini) are
> idempotent; if a thing hasn't been allocated, or has already been
> deallocated, then these functions will just do nothing.

I agree. We will keep the symmetry here 
i915_guc_submission_init(_enable, _disable and _fini).
> HOWEVER,
>
> while reviewing this I've noticed that the locking is all screwed up;
> basically "bf248ca drm/i915: Fix locking around GuC firmware load"
> removed locking round the calls into i915_guc_loader.c and added it back
> in a few places, but not enough.
>
> It would probably have been better to have left the locking in the
> caller, and hence round the entirety of the calls to _init, _load,
> _fini, and then explicitly DROP the mutex only for the duration of the
> request_firmware call.
>
> It would have been better still not to insist on synchronous firmware
> load in the first place; the original generic (and asynchronous) loader
> didn't require struct_mutex or any other locking around the
> request_firmware() call, so we wouldn't now have to fix it (again).
>
> At present, in intel_guc_loader.c, intel_guc_ucode_load() is called with
> the struct_mutex already held by the caller, but _init() and _fini() are
> called with it NOT held.
>
> All exported functions in i915_guc_submission.c expect it to be held
> when they're called.
>
> On that basis, what we need now is:
>
> guc_fw_fetch() needs to take & release the mutex round the unreference
> in the fail: path (like the code in _fini above).

I prefer the current approach that only takes lock for necessary 
critical session.
> intel_guc_ucode_fini() needs to extend the scope of the lock to enclose
> all calls to _submission_ functions. So the above becomes:
>
> /**
> * intel_guc_ucode_fini() - clean up all allocated resources
> * @dev: drm device
> */
> void intel_guc_ucode_fini(struct drm_device *dev)
> {
> 	struct drm_i915_private *dev_priv = dev->dev_private;
> 	struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
>
> 	mutex_lock(&dev->struct_mutex);
> 	direct_interrupts_to_host(dev_priv);
> 	i915_guc_submission_disable(dev);
> 	i915_guc_submission_fini(dev);
>
> 	if (guc_fw->guc_fw_obj)
> 		drm_gem_object_unreference(&guc_fw->guc_fw_obj->base);
> 	guc_fw->guc_fw_obj = NULL;
> 	mutex_unlock(&dev->struct_mutex);
>
> 	guc_fw->guc_fw_fetch_status = GUC_FIRMWARE_NONE;
> }

This is done by patch https://patchwork.freedesktop.org/patch/68708/. 
Please review this one.
> FINALLY,
>
> intel_guc_ucode_load() should probably call i915_guc_submission_fini()
> in the failure path (after submission_disable()) as it called
> submission_init() earlier. Not critical, as it will get called from
> ucode_fini() anyway, but it improves symmetry.
>
>

We don't have ucode_unload(). The ucode_fini() is actually doing the 
unload job. Because ucode_fini() needs to acquire the lock but 
ucode_load() expects that lock is held by caller, calling ucode_fini() 
inside ucode_load() is not good. I don't think it is worth to wrap up a 
ucode_unload() call which only includes two lines 
(direct_interrupts_to_host and i915_guc_submission_disable).

Thanks,
Alex


More information about the Intel-gfx mailing list