[PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset v2

zhoucm1 david1.zhou at amd.com
Fri Apr 28 08:33:46 UTC 2017


Agree, but libdrm doesn't allow concurrent submissions from same 
context, like protection 'pthread_mutex_lock(&context->sequence_mutex);' 
in amdgpu_cs_submit_one.

Regards,
David Zhou
On 2017年04月28日 16:15, Christian König wrote:
> Indeed, but after a bit of thinking I've found another problem with 
> that patch.
>
> When two threads are pushing jobs into the same scheduler context we 
> don't guarantee correct execution order any more!
>
> Before that patch it was handled by the exclusiveness we had because 
> of reserving the VM page tables, but now nothing prevents us from 
> calling amd_sched_entity_push_job() in nondeterministic order.
>
> In other words we need an additional lock in amdgpu_ctx_ring or 
> something like that.
>
> Regards,
> Christian.
>
> Am 28.04.2017 um 04:51 schrieb Zhang, Jerry:
>> Nice catch!
>> Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com>
>>
>> Regards,
>> Jerry (Junwei Zhang)
>>
>> Linux Base Graphics
>> SRDC Software Development
>> _____________________________________
>>
>>
>>> -----Original Message-----
>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On 
>>> Behalf Of
>>> Chunming Zhou
>>> Sent: Friday, April 28, 2017 10:46
>>> To: amd-gfx at lists.freedesktop.org
>>> Cc: Zhou, David(ChunMing)
>>> Subject: [PATCH] drm/amdgpu: fix deadlock of reservation between cs 
>>> and gpu
>>> reset v2
>>>
>>> the case could happen when gpu reset:
>>> 1. when gpu reset, cs can be continue until sw queue is full, then 
>>> push job will
>>> wait with holding pd reservation.
>>> 2. gpu_reset routine will also need pd reservation to restore page 
>>> table from
>>> their shadow.
>>> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting 
>>> for cs releases
>>> reservation.
>>>
>>> v2: handle amdgpu_cs_submit error path.
>>>
>>> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>> Reviewed-by: Christian König <christian.koenig at amd.com>
>>> Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com>
>>> Reviewed-by: Monk Liu <monk.liu at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 26168df..699f5fe 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1074,6 +1074,7 @@ static int amdgpu_cs_submit(struct 
>>> amdgpu_cs_parser
>>> *p,
>>>       cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
>>>       job->uf_sequence = cs->out.handle;
>>>       amdgpu_job_free_resources(job);
>>> +    amdgpu_cs_parser_fini(p, 0, true);
>>>
>>>       trace_amdgpu_cs_ioctl(job);
>>>       amd_sched_entity_push_job(&job->base);
>>> @@ -1129,7 +1130,10 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void
>>> *data, struct drm_file *filp)
>>>           goto out;
>>>
>>>       r = amdgpu_cs_submit(&parser, cs);
>>> +    if (r)
>>> +        goto out;
>>>
>>> +    return 0;
>>>   out:
>>>       amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>>>       return r;
>>> -- 
>>> 1.9.1
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>



More information about the amd-gfx mailing list