[PATCH] drm/amdgpu: guard ib scheduling while in reset

Christian König ckoenig.leichtzumerken at gmail.com
Wed Oct 30 14:50:43 UTC 2019


A lock inside the scheduler is rather tricky to implement.

What you need to do is to get rid of the park()/unpark() hack in 
drm_sched_entity_fini().

We could do this with a struct completion or convert the scheduler from 
a thread to a work item.

Regards,
Christian.

Am 30.10.19 um 15:44 schrieb Grodzovsky, Andrey:
> That good  as proof of RCA but I still think we should grab a dedicated
> lock inside scheduler since the race is internal to scheduler code so
> this better to handle it inside the scheduler code to make the fix apply
> for all drivers using it.
>
> Andrey
>
> On 10/30/19 4:44 AM, S, Shirish wrote:
>>>>> We still have it and isn't doing kthread_park()/unpark() from
>>>>> drm_sched_entity_fini while GPU reset in progress defeats all the
>>>>> purpose of drm_sched_stop->kthread_park ? If
>>>>> drm_sched_entity_fini-> kthread_unpark happens AFTER
>>>>> drm_sched_stop->kthread_park nothing prevents from another (third)
>>>>> thread keep submitting job to HW which will be picked up by the
>>>>> unparked scheduler thread try to submit to HW but fail because the
>>>>> HW ring is deactivated.
>>>>>
>>>>> If so maybe we should serialize calls to
>>>>> kthread_park/unpark(sched->thread) ?
>>>>>
>>>> Yeah, that was my thinking as well. Probably best to just grab the
>>>> reset lock before calling drm_sched_entity_fini().
>>>
>>> Shirish - please try locking &adev->lock_reset around calls to
>>> drm_sched_entity_fini as Christian suggests and see if this actually
>>> helps the issue.
>>>
>> Yes that also works.
>>
>> Regards,
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list