[PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset

Christian König deathsimple at vodafone.de
Thu Apr 27 12:04:41 UTC 2017


Am 27.04.2017 um 12:23 schrieb zhoucm1:
>
>
> On 2017年04月27日 18:11, Christian König wrote:
>> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>>> the case could happen when gpu reset:
>>>> 1. when gpu reset, cs can be continue until sw queue is full, then 
>>>> push job will wait with holding pd reservation.
>>>> 2. gpu_reset routine will also need pd reservation to restore page 
>>>> table from their shadow.
>>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting 
>>>> for cs releases reservation.
>>>>
>>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 9edb1a4..a6722a7 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct 
>>>> amdgpu_cs_parser *p,
>>>>       amdgpu_job_free_resources(job);
>>>>
>>>>       trace_amdgpu_cs_ioctl(job);
>>>> +    amdgpu_cs_parser_fini(p, 0, true);
>>>
>>> Please confirm:
>>> It will not free/release more things that may be accessed in job 
>>> working process, as cs_parse_fini free so many things related to 
>>> parser.
>>
>> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>>
> No, with my patch, parser->job already is NULL here, same as previous 
> sequence, just put push_job action after amdgpu_cs_parser_fini()

Ah! You add that into amdgpu_cs_submit! That was the point I was missing.

>   	trace_amdgpu_cs_ioctl(job);
> +	amdgpu_cs_parser_fini(p, 0, true);
>   	amd_sched_entity_push_job(&job->base);
Can we keep the trace_amdgpu_cs_ioctl(); directly before the 
amd_sched_entity_push_job()?

With that changed, the patch is Reviewed-by: Christian König 
<christian.koenig at amd.com>.

Regards,
Christian.

>>         if (parser->job)
>>                 amdgpu_job_free(parser->job);
>>
>> And amdgpu_cs_submit() sets the fence, so it must be called before 
>> amdgpu_cs_parser_fini().
> yes, fence is already generated, so amdgpu_cs_parser_fini before 
> pushing job is completely safe here.
>>
>> But apart from that this is a rather nifty idea. What was the problem 
>> with the initial patch?
> Just as comments in patch, which is found by Monk.
>
> Regards,
> David Zhou
>>
>>>
>>> Or we can just release the pd reservation here?
>>
>> We could run into problem with the reservation ticked with that. So I 
>> would want to avoid that.
>>
>> Christian.
>>
>>>
>>> Jerry
>>>
>>>> amd_sched_entity_push_job(&job->base);
>>>>
>>>>       return 0;
>>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, 
>>>> void *data, struct drm_file *filp)
>>>>
>>>>       r = amdgpu_cs_submit(&parser, cs);
>>>>
>>>> +    return r;
>>>>   out:
>>>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>>       return r;
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx




More information about the amd-gfx mailing list