[PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset

Zhang, Jerry (Junwei) Jerry.Zhang at amd.com
Fri Apr 28 02:37:57 UTC 2017


On 04/27/2017 08:04 PM, Christian König wrote:
> Am 27.04.2017 um 12:23 schrieb zhoucm1:
>>
>>
>> On 2017年04月27日 18:11, Christian König wrote:
>>> Am 27.04.2017 um 11:08 schrieb Zhang, Jerry (Junwei):
>>>> On 04/27/2017 04:25 PM, Chunming Zhou wrote:
>>>>> the case could happen when gpu reset:
>>>>> 1. when gpu reset, cs can be continue until sw queue is full, then push
>>>>> job will wait with holding pd reservation.
>>>>> 2. gpu_reset routine will also need pd reservation to restore page table
>>>>> from their shadow.
>>>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs
>>>>> releases reservation.
>>>>>
>>>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>>>   1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> index 9edb1a4..a6722a7 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> @@ -1076,6 +1076,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>       amdgpu_job_free_resources(job);
>>>>>
>>>>>       trace_amdgpu_cs_ioctl(job);
>>>>> +    amdgpu_cs_parser_fini(p, 0, true);
>>>>
>>>> Please confirm:
>>>> It will not free/release more things that may be accessed in job working
>>>> process, as cs_parse_fini free so many things related to parser.
>>>
>>> Yeah, that indeed won't work. amdgpu_cs_parser_fini() does:
>>>
>> No, with my patch, parser->job already is NULL here, same as previous
>> sequence, just put push_job action after amdgpu_cs_parser_fini()
>
> Ah! You add that into amdgpu_cs_submit! That was the point I was missing.

Thanks your all input.
See it too.

Feel free to add my RB, if need.
Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com>

Jerry

>
>>       trace_amdgpu_cs_ioctl(job);
>> +    amdgpu_cs_parser_fini(p, 0, true);
>>       amd_sched_entity_push_job(&job->base);
> Can we keep the trace_amdgpu_cs_ioctl(); directly before the
> amd_sched_entity_push_job()?
>
> With that changed, the patch is Reviewed-by: Christian König
> <christian.koenig at amd.com>.
>
> Regards,
> Christian.
>
>>>         if (parser->job)
>>>                 amdgpu_job_free(parser->job);
>>>
>>> And amdgpu_cs_submit() sets the fence, so it must be called before
>>> amdgpu_cs_parser_fini().
>> yes, fence is already generated, so amdgpu_cs_parser_fini before pushing job
>> is completely safe here.
>>>
>>> But apart from that this is a rather nifty idea. What was the problem with
>>> the initial patch?
>> Just as comments in patch, which is found by Monk.
>>
>> Regards,
>> David Zhou
>>>
>>>>
>>>> Or we can just release the pd reservation here?
>>>
>>> We could run into problem with the reservation ticked with that. So I would
>>> want to avoid that.
>>>
>>> Christian.
>>>
>>>>
>>>> Jerry
>>>>
>>>>> amd_sched_entity_push_job(&job->base);
>>>>>
>>>>>       return 0;
>>>>> @@ -1130,6 +1131,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void
>>>>> *data, struct drm_file *filp)
>>>>>
>>>>>       r = amdgpu_cs_submit(&parser, cs);
>>>>>
>>>>> +    return r;
>>>>>   out:
>>>>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>>>       return r;
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>


More information about the amd-gfx mailing list