[PATCH] drm/amd/amdgpu: fix potential bad job hw_fence underflow

Andrey Grodzovsky andrey.grodzovsky at amd.com
Fri Oct 22 20:41:46 UTC 2021


What do you mean by underflow in this case ? You mean use after free 
because of extra dma_fence_put() ?

On 2021-10-22 4:14 a.m., JingWen Chen wrote:
> ping
>
> On 2021/10/22 AM11:33, Jingwen Chen wrote:
>> [Why]
>> In advance tdr mode, the real bad job will be resubmitted twice, while
>> in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job
>> is put one more time than other jobs.
>>
>> [How]
>> Adding dma_fence_get before resbumit job in
>> amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs
>>
>> Signed-off-by: Jingwen Chen <Jingwen.Chen2 at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 41ce86244144..975f069f6fe8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4841,6 +4841,9 @@ static void amdgpu_device_recheck_guilty_jobs(
>>   
>>   		/* clear job's guilty and depend the folowing step to decide the real one */
>>   		drm_sched_reset_karma(s_job);
>> +		/* for the real bad job, it will be resubmitted twice, adding a dma_fence_get
>> +		 * to make sure fence is balanced */


But that put in drm_sched_resubmit_jobs_ext is for the previous parent 
fence.
fence = sched->ops->run_job(s_job); returns a new HW fence and the put 
drops the refcount on the old one.

Andrey


>> +		dma_fence_get(s_job->s_fence->parent);
>>   		drm_sched_resubmit_jobs_ext(&ring->sched, 1);
>>   
>>   		ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout);
>> @@ -4876,6 +4879,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>>   
>>   		/* got the hw fence, signal finished fence */
>>   		atomic_dec(ring->sched.score);
>> +		dma_fence_put(s_job->s_fence->parent);
>>   		dma_fence_get(&s_job->s_fence->finished);
>>   		dma_fence_signal(&s_job->s_fence->finished);
>>   		dma_fence_put(&s_job->s_fence->finished);


More information about the amd-gfx mailing list