[PATCH] drm/amdgpu: dont schedule jobs while in reset

Grodzovsky, Andrey Andrey.Grodzovsky at amd.com
Wed Oct 30 15:06:24 UTC 2019


Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>

Andrey

On 10/30/19 6:20 AM, Koenig, Christian wrote:
> Am 30.10.19 um 10:13 schrieb S, Shirish:
>> [Why]
>>
>> doing kthread_park()/unpark() from drm_sched_entity_fini
>> while GPU reset is in progress defeats all the purpose of
>> drm_sched_stop->kthread_park.
>> If drm_sched_entity_fini->kthread_unpark() happens AFTER
>> drm_sched_stop->kthread_park nothing prevents from another
>> (third) thread to keep submitting job to HW which will be
>> picked up by the unparked scheduler thread and try to submit
>> to HW but fail because the HW ring is deactivated.
>>
>> [How]
>> grab the reset lock before calling drm_sched_entity_fini()
>>
>> Signed-off-by: Shirish S <shirish.s at amd.com>
>> Suggested-by: Christian König <christian.koenig at amd.com>
> Patch itself is Reviewed-by: Christian König <christian.koenig at amd.com>
>
> Does that also fix the problems you have been seeing?
>
> Thanks,
> Christian.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 ++++-
>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index 6614d8a..2cdaf3b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>>    			continue;
>>    		}
>>    
>> -		for (i = 0; i < num_entities; i++)
>> +		for (i = 0; i < num_entities; i++) {
>> +			mutex_lock(&ctx->adev->lock_reset);
>>    			drm_sched_entity_fini(&ctx->entities[0][i].entity);
>> +			mutex_unlock(&ctx->adev->lock_reset);
>> +		}
>>    	}
>>    }
>>    


More information about the amd-gfx mailing list