[Lima] [PATCH] gpu: drm: remove redundant dma_fence_put() when drm_sched_job_add_dependency() fails

Wed Apr 27 14:43:31 UTC 2022

On 2022-04-26 22:31, Hangyu Hua wrote:
> On 2022/4/26 22:55, Andrey Grodzovsky wrote:
>>
>> On 2022-04-25 22:54, Hangyu Hua wrote:
>>> On 2022/4/25 23:42, Andrey Grodzovsky wrote:
>>>> On 2022-04-25 04:36, Hangyu Hua wrote:
>>>>
>>>>> When drm_sched_job_add_dependency() fails, dma_fence_put() will be 
>>>>> called
>>>>> internally. Calling it again after drm_sched_job_add_dependency() 
>>>>> finishes
>>>>> may result in a dangling pointer.
>>>>>
>>>>> Fix this by removing redundant dma_fence_put().
>>>>>
>>>>> Signed-off-by: Hangyu Hua <hbh25y at gmail.com>
>>>>> ---
>>>>>   drivers/gpu/drm/lima/lima_gem.c        | 1 -
>>>>>   drivers/gpu/drm/scheduler/sched_main.c | 1 -
>>>>>   2 files changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/lima/lima_gem.c 
>>>>> b/drivers/gpu/drm/lima/lima_gem.c
>>>>> index 55bb1ec3c4f7..99c8e7f6bb1c 100644
>>>>> --- a/drivers/gpu/drm/lima/lima_gem.c
>>>>> +++ b/drivers/gpu/drm/lima/lima_gem.c
>>>>> @@ -291,7 +291,6 @@ static int lima_gem_add_deps(struct drm_file 
>>>>> *file, struct lima_submit *submit)
>>>>>           err = drm_sched_job_add_dependency(&submit->task->base, 
>>>>> fence);
>>>>>           if (err) {
>>>>> -            dma_fence_put(fence);
>>>>>               return err;
>>>>
>>>>
>>>> Makes sense here
>>>>
>>>>
>>>>>           }
>>>>>       }
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index b81fceb0b8a2..ebab9eca37a8 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -708,7 +708,6 @@ int 
>>>>> drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job,
>>>>>           dma_fence_get(fence);
>>>>>           ret = drm_sched_job_add_dependency(job, fence);
>>>>>           if (ret) {
>>>>> -            dma_fence_put(fence);
>>>>
>>>>
>>>>
>>>> Not sure about this one since if you look at the relevant commits -
>>>> 'drm/scheduler: fix drm_sched_job_add_implicit_dependencies' and
>>>> 'drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder'
>>>> You will see that the dma_fence_put here balances the extra 
>>>> dma_fence_get
>>>> above
>>>>
>>>> Andrey
>>>>
>>>
>>> I don't think so. I checked the call chain and found no additional 
>>> dma_fence_get(). But dma_fence_get() needs to be called before 
>>> drm_sched_job_add_dependency() to keep the counter balanced. 
>>
>>
>> I don't say there is an additional get, I just say that 
>> drm_sched_job_add_dependency doesn't grab an extra reference to the 
>> fences it stores so this needs to be done outside and for that
>> drm_sched_job_add_implicit_dependencies->dma_fence_get is called and, 
>> if this addition fails you just call dma_fence_put to keep the 
>> counter balanced.
>>
>
> drm_sched_job_add_implicit_dependencies() will call 
> drm_sched_job_add_dependency(). And drm_sched_job_add_dependency() 
> already call dma_fence_put() when it fails. Calling dma_fence_put() 
> twice doesn't make sense.
>
> dma_fence_get() is in [2]. But dma_fence_put() will be called in [1] 
> and [3] when xa_alloc() fails.

The way I see it, [2] and [3] are mat matching *get* and *put* 
respectively. [1] *put* is against the original 
dma_fence_init->kref_init of the fence which always set the refcount at 1.
Also in support of this see commit 'drm/scheduler: fix 
drm_sched_job_add_implicit_dependencies harder' - it says there 
"drm_sched_job_add_dependency() could drop the last ref"  - this last 
ref is the original refcount set by dma_fence_init->kref

Andrey

>
>
> int drm_sched_job_add_dependency(struct drm_sched_job *job,
>                  struct dma_fence *fence)
> {
>     ...
>     ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, 
> GFP_KERNEL);
>     if (ret != 0)
>         dma_fence_put(fence);    <--- [1]
>
>     return ret;
> }
> EXPORT_SYMBOL(drm_sched_job_add_dependency);
>
>
> int drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job,
>                         struct drm_gem_object *obj,
>                         bool write)
> {
>     struct dma_resv_iter cursor;
>     struct dma_fence *fence;
>     int ret;
>
>     dma_resv_for_each_fence(&cursor, obj->resv, write, fence) {
>         /* Make sure to grab an additional ref on the added fence */
>         dma_fence_get(fence);    <--- [2]
>         ret = drm_sched_job_add_dependency(job, fence);
>         if (ret) {
>             dma_fence_put(fence);    <--- [3]
>             return ret;
>         }
>     }
>     return 0;
> }
>
>
>>
>>> On the other hand, dma_fence_get() and dma_fence_put() are 
>>> meaningless here if threre is an extra dma_fence_get() beacause 
>>> counter will not decrease to 0 during drm_sched_job_add_dependency().
>>>
>>> I check the call chain as follows:
>>>
>>> msm_ioctl_gem_submit()
>>> -> submit_fence_sync()
>>> -> drm_sched_job_add_implicit_dependencies()
>>
>>
>> Can you maybe trace or print one such example of problematic refcount 
>> that you are trying to fix ? I still don't see where is the problem.
>>
>> Andrey
>>
>
> I also wish I could. System logs can make this easy. But i don't have 
> a corresponding GPU physical device. 
> drm_sched_job_add_implicit_dependencies is only used in a few devices.
>
> Thanks.
>>
>>>
>>> Thanks,
>>> Hangyu
>>>
>>>>
>>>>>               return ret;
>>>>>           }
>>>>>       }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/lima/attachments/20220427/b5998eec/attachment.htm>