[PATCH] drm/amdgpu: dont schedule jobs while in reset
Grodzovsky, Andrey
Andrey.Grodzovsky at amd.com
Wed Oct 30 14:45:35 UTC 2019
On 10/30/19 6:22 AM, S, Shirish wrote:
> On 10/30/2019 3:50 PM, Koenig, Christian wrote:
>> Am 30.10.19 um 10:13 schrieb S, Shirish:
>>> [Why]
>>>
>>> doing kthread_park()/unpark() from drm_sched_entity_fini
>>> while GPU reset is in progress defeats all the purpose of
>>> drm_sched_stop->kthread_park.
>>> If drm_sched_entity_fini->kthread_unpark() happens AFTER
>>> drm_sched_stop->kthread_park nothing prevents from another
>>> (third) thread to keep submitting job to HW which will be
>>> picked up by the unparked scheduler thread and try to submit
>>> to HW but fail because the HW ring is deactivated.
>>>
>>> [How]
>>> grab the reset lock before calling drm_sched_entity_fini()
>>>
>>> Signed-off-by: Shirish S <shirish.s at amd.com>
>>> Suggested-by: Christian König <christian.koenig at amd.com>
>> Patch itself is Reviewed-by: Christian König <christian.koenig at amd.com>
>>
>> Does that also fix the problems you have been seeing?
> Yes Christian.
>
> Regards,
>
> Shirish S
Missed that one, why don't we fix it within scheduler code - the race is
within scheduler ?
Andrey
>
>> Thanks,
>> Christian.
>>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 ++++-
>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> index 6614d8a..2cdaf3b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>>> continue;
>>> }
>>>
>>> - for (i = 0; i < num_entities; i++)
>>> + for (i = 0; i < num_entities; i++) {
>>> + mutex_lock(&ctx->adev->lock_reset);
>>> drm_sched_entity_fini(&ctx->entities[0][i].entity);
>>> + mutex_unlock(&ctx->adev->lock_reset);
>>> + }
>>> }
>>> }
>>>
More information about the amd-gfx
mailing list