[Intel-gfx] [PATCH v1] drm/i915/guc: Fix a memory leak where guc->execbuf_client is not freed

Wed Jan 13 10:51:44 PST 2016

On 13/01/16 18:17, Yu Dai wrote:
>
> On 01/13/2016 10:15 AM, Dave Gordon wrote:
>> On 12/01/16 23:17, yu.dai at intel.com wrote:
>> > From: Alex Dai <yu.dai at intel.com>
>> >
>> > During driver unloading, the guc_client created for command submission
>> > needs to be released to avoid memory leak.
>> >
>> > The struct_mutex needs to be held before tearing down GuC.
>> >
>> > v1: Move i915_guc_submission_disable out of i915_guc_submission_fini
>> and
>> >      take struct_mutex lock before release GuC client. (Dave Gordon)
>>
>> You don't seem to have implemented all the points I mentioned? I think
>> you want:
>>
>> drivers/gpu/drm/i915/intel_guc_loader.c:
>> @@ -445,6 +445,7 @@ int intel_guc_ucode_load(struct drm_device *dev)
>>
>>           direct_interrupts_to_host(dev_priv);
>>           i915_guc_submission_disable(dev);
>> +       i915_guc_submission_fini(dev);
>>
>> Optional, but cleaner. We called i915_guc_submission_init() earlier in
>> this function, so we should call i915_guc_submission_fini() in the
>> failure path. That way, we either succeed, or leave the system state
>> unchanged, NOT leaving extra objects allocated.
>>
>>           return err;
>>    }
>
> I don't want this because struct_mutex is held by caller already while
> the fini() will acquire it too.

Yes it is and no it won't. That's guc_*submission*_fini() I want to call 
(which requires the mutex held), not intel_guc_*ucode*_fini() (which, as 
you say, acquires it).

.Dave.

>> @@ -561,10 +562,12 @@ static void guc_fw_fetch(struct drm_device *dev,
>> struct intel_guc_fw *guc_fw)
>>        DRM_ERROR("Failed to fetch GuC firmware from %s (error %d)\n",
>>              guc_fw->guc_fw_path, err);
>>
>> +    mutex_lock(&dev->struct_mutex);
>>        obj = guc_fw->guc_fw_obj;
>>        if (obj)
>>            drm_gem_object_unreference(&obj->base);
>>        guc_fw->guc_fw_obj = NULL;
>> +    mutex_unlock(&dev->struct_mutex);
>>
>> This is the locking that needs to be added to the failure path.
>> This is required *in addition to* the locking reorganisation below.
>
> I missed this part.
>> > Signed-off-by: Alex Dai <yu.dai at intel.com>
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c
>> b/drivers/gpu/drm/i915/intel_guc_loader.c
>> > index d20788f..70fa8f5 100644
>> > --- a/drivers/gpu/drm/i915/intel_guc_loader.c
>> > +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
>> > @@ -631,10 +631,11 @@ void intel_guc_ucode_fini(struct drm_device *dev)
>> >       struct drm_i915_private *dev_priv = dev->dev_private;
>> >       struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
>> >
>> > +    mutex_lock(&dev->struct_mutex);
>> >       direct_interrupts_to_host(dev_priv);
>> > +    i915_guc_submission_disable(dev);
>> >       i915_guc_submission_fini(dev);
>> >
>> > -    mutex_lock(&dev->struct_mutex);
>> >       if (guc_fw->guc_fw_obj)
>> >           drm_gem_object_unreference(&guc_fw->guc_fw_obj->base);
>> >       guc_fw->guc_fw_obj = NULL;
>>
>> This bit is fine, but incomplete without the other changes above.
>>
>> .Dave.
>