[PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset

Liu, Monk Monk.Liu at amd.com
Thu Apr 27 14:02:40 UTC 2017


A really good improvement 

Reviewed-by: Monk Liu <monk.liu at amd.com>

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf Of zhoucm1
Sent: Thursday, April 27, 2017 6:24 PM
To: Christian König <deathsimple at vodafone.de>; Zhang, Jerry <Jerry.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Liu, Monk <Monk.Liu at amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset



On 2017年04月27日 18:11, Christian König wrote:
> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>> the case could happen when gpu reset:
>>> 1. when gpu reset, cs can be continue until sw queue is full, then 
>>> push job will wait with holding pd reservation.
>>> 2. gpu_reset routine will also need pd reservation to restore page 
>>> table from their shadow.
>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting 
>>> for cs releases reservation.
>>>
>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 9edb1a4..a6722a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct 
>>> amdgpu_cs_parser *p,
>>>       amdgpu_job_free_resources(job);
>>>
>>>       trace_amdgpu_cs_ioctl(job);
>>> +    amdgpu_cs_parser_fini(p, 0, true);
>>
>> Please confirm:
>> It will not free/release more things that may be accessed in job 
>> working process, as cs_parse_fini free so many things related to parser.
>
> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>
No, with my patch, parser->job already is NULL here, same as previous sequence, just put push_job action after amdgpu_cs_parser_fini()
>         if (parser->job)
>                 amdgpu_job_free(parser->job);
>
> And amdgpu_cs_submit() sets the fence, so it must be called before 
> amdgpu_cs_parser_fini().
yes, fence is already generated, so amdgpu_cs_parser_fini before pushing job is completely safe here.
>
> But apart from that this is a rather nifty idea. What was the problem 
> with the initial patch?
Just as comments in patch, which is found by Monk.

Regards,
David Zhou
>
>>
>> Or we can just release the pd reservation here?
>
> We could run into problem with the reservation ticked with that. So I 
> would want to avoid that.
>
> Christian.
>
>>
>> Jerry
>>
>>> amd_sched_entity_push_job(&job->base);
>>>
>>>       return 0;
>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, 
>>> void *data, struct drm_file *filp)
>>>
>>>       r = amdgpu_cs_submit(&parser, cs);
>>>
>>> +    return r;
>>>   out:
>>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>       return r;
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list