[PATCH] drm/amdgpu: guard ib scheduling while in reset

Grodzovsky, Andrey Andrey.Grodzovsky at amd.com
Wed Oct 30 14:44:00 UTC 2019


That good  as proof of RCA but I still think we should grab a dedicated 
lock inside scheduler since the race is internal to scheduler code so 
this better to handle it inside the scheduler code to make the fix apply 
for all drivers using it.

Andrey

On 10/30/19 4:44 AM, S, Shirish wrote:
>>>>
>>>> We still have it and isn't doing kthread_park()/unpark() from 
>>>> drm_sched_entity_fini while GPU reset in progress defeats all the 
>>>> purpose of drm_sched_stop->kthread_park ? If 
>>>> drm_sched_entity_fini-> kthread_unpark happens AFTER 
>>>> drm_sched_stop->kthread_park nothing prevents from another (third) 
>>>> thread keep submitting job to HW which will be picked up by the 
>>>> unparked scheduler thread try to submit to HW but fail because the 
>>>> HW ring is deactivated.
>>>>
>>>> If so maybe we should serialize calls to 
>>>> kthread_park/unpark(sched->thread) ?
>>>>
>>>
>>> Yeah, that was my thinking as well. Probably best to just grab the 
>>> reset lock before calling drm_sched_entity_fini().
>>
>>
>> Shirish - please try locking &adev->lock_reset around calls to 
>> drm_sched_entity_fini as Christian suggests and see if this actually 
>> helps the issue.
>>
> Yes that also works.
>
> Regards,
>


More information about the amd-gfx mailing list