[PATCH] gpu: drm: remove redundant dma_fence_put() when drm_sched_job_add_dependency() fails
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Wed Apr 27 14:43:31 UTC 2022
On 2022-04-26 22:31, Hangyu Hua wrote:
> On 2022/4/26 22:55, Andrey Grodzovsky wrote:
>>
>> On 2022-04-25 22:54, Hangyu Hua wrote:
>>> On 2022/4/25 23:42, Andrey Grodzovsky wrote:
>>>> On 2022-04-25 04:36, Hangyu Hua wrote:
>>>>
>>>>> When drm_sched_job_add_dependency() fails, dma_fence_put() will be
>>>>> called
>>>>> internally. Calling it again after drm_sched_job_add_dependency()
>>>>> finishes
>>>>> may result in a dangling pointer.
>>>>>
>>>>> Fix this by removing redundant dma_fence_put().
>>>>>
>>>>> Signed-off-by: Hangyu Hua <hbh25y at gmail.com>
>>>>> ---
>>>>> drivers/gpu/drm/lima/lima_gem.c | 1 -
>>>>> drivers/gpu/drm/scheduler/sched_main.c | 1 -
>>>>> 2 files changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/lima/lima_gem.c
>>>>> b/drivers/gpu/drm/lima/lima_gem.c
>>>>> index 55bb1ec3c4f7..99c8e7f6bb1c 100644
>>>>> --- a/drivers/gpu/drm/lima/lima_gem.c
>>>>> +++ b/drivers/gpu/drm/lima/lima_gem.c
>>>>> @@ -291,7 +291,6 @@ static int lima_gem_add_deps(struct drm_file
>>>>> *file, struct lima_submit *submit)
>>>>> err = drm_sched_job_add_dependency(&submit->task->base,
>>>>> fence);
>>>>> if (err) {
>>>>> - dma_fence_put(fence);
>>>>> return err;
>>>>
>>>>
>>>> Makes sense here
>>>>
>>>>
>>>>> }
>>>>> }
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index b81fceb0b8a2..ebab9eca37a8 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -708,7 +708,6 @@ int
>>>>> drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job,
>>>>> dma_fence_get(fence);
>>>>> ret = drm_sched_job_add_dependency(job, fence);
>>>>> if (ret) {
>>>>> - dma_fence_put(fence);
>>>>
>>>>
>>>>
>>>> Not sure about this one since if you look at the relevant commits -
>>>> 'drm/scheduler: fix drm_sched_job_add_implicit_dependencies' and
>>>> 'drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder'
>>>> You will see that the dma_fence_put here balances the extra
>>>> dma_fence_get
>>>> above
>>>>
>>>> Andrey
>>>>
>>>
>>> I don't think so. I checked the call chain and found no additional
>>> dma_fence_get(). But dma_fence_get() needs to be called before
>>> drm_sched_job_add_dependency() to keep the counter balanced.
>>
>>
>> I don't say there is an additional get, I just say that
>> drm_sched_job_add_dependency doesn't grab an extra reference to the
>> fences it stores so this needs to be done outside and for that
>> drm_sched_job_add_implicit_dependencies->dma_fence_get is called and,
>> if this addition fails you just call dma_fence_put to keep the
>> counter balanced.
>>
>
> drm_sched_job_add_implicit_dependencies() will call
> drm_sched_job_add_dependency(). And drm_sched_job_add_dependency()
> already call dma_fence_put() when it fails. Calling dma_fence_put()
> twice doesn't make sense.
>
> dma_fence_get() is in [2]. But dma_fence_put() will be called in [1]
> and [3] when xa_alloc() fails.
The way I see it, [2] and [3] are mat matching *get* and *put*
respectively. [1] *put* is against the original
dma_fence_init->kref_init of the fence which always set the refcount at 1.
Also in support of this see commit 'drm/scheduler: fix
drm_sched_job_add_implicit_dependencies harder' - it says there
"drm_sched_job_add_dependency() could drop the last ref" - this last
ref is the original refcount set by dma_fence_init->kref
Andrey
>
>
> int drm_sched_job_add_dependency(struct drm_sched_job *job,
> struct dma_fence *fence)
> {
> ...
> ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b,
> GFP_KERNEL);
> if (ret != 0)
> dma_fence_put(fence); <--- [1]
>
> return ret;
> }
> EXPORT_SYMBOL(drm_sched_job_add_dependency);
>
>
> int drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job,
> struct drm_gem_object *obj,
> bool write)
> {
> struct dma_resv_iter cursor;
> struct dma_fence *fence;
> int ret;
>
> dma_resv_for_each_fence(&cursor, obj->resv, write, fence) {
> /* Make sure to grab an additional ref on the added fence */
> dma_fence_get(fence); <--- [2]
> ret = drm_sched_job_add_dependency(job, fence);
> if (ret) {
> dma_fence_put(fence); <--- [3]
> return ret;
> }
> }
> return 0;
> }
>
>
>>
>>> On the other hand, dma_fence_get() and dma_fence_put() are
>>> meaningless here if threre is an extra dma_fence_get() beacause
>>> counter will not decrease to 0 during drm_sched_job_add_dependency().
>>>
>>> I check the call chain as follows:
>>>
>>> msm_ioctl_gem_submit()
>>> -> submit_fence_sync()
>>> -> drm_sched_job_add_implicit_dependencies()
>>
>>
>> Can you maybe trace or print one such example of problematic refcount
>> that you are trying to fix ? I still don't see where is the problem.
>>
>> Andrey
>>
>
> I also wish I could. System logs can make this easy. But i don't have
> a corresponding GPU physical device.
> drm_sched_job_add_implicit_dependencies is only used in a few devices.
>
> Thanks.
>>
>>>
>>> Thanks,
>>> Hangyu
>>>
>>>>
>>>>> return ret;
>>>>> }
>>>>> }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20220427/b5998eec/attachment.htm>
More information about the dri-devel
mailing list