[PATCH] drm/amdgpu: guard ib scheduling while in reset

Koenig, Christian Christian.Koenig at amd.com
Wed Oct 30 15:00:42 UTC 2019


Yeah, and exactly that's the problem :) You need a global lock covering 
all schedulers.

Otherwise you end up in hell's kitchen again with taking all those locks 
in the right order.

Christian.

Am 30.10.19 um 15:56 schrieb Grodzovsky, Andrey:
> Can you elaborate on what is the tricky part with the lock ? I assumed
> we just use per scheduler lock.
>
> Andrey
>
> On 10/30/19 10:50 AM, Christian König wrote:
>> A lock inside the scheduler is rather tricky to implement.
>>
>> What you need to do is to get rid of the park()/unpark() hack in
>> drm_sched_entity_fini().
>>
>> We could do this with a struct completion or convert the scheduler
>> from a thread to a work item.
>>
>> Regards,
>> Christian.
>>
>> Am 30.10.19 um 15:44 schrieb Grodzovsky, Andrey:
>>> That good  as proof of RCA but I still think we should grab a dedicated
>>> lock inside scheduler since the race is internal to scheduler code so
>>> this better to handle it inside the scheduler code to make the fix apply
>>> for all drivers using it.
>>>
>>> Andrey
>>>
>>> On 10/30/19 4:44 AM, S, Shirish wrote:
>>>>>>> We still have it and isn't doing kthread_park()/unpark() from
>>>>>>> drm_sched_entity_fini while GPU reset in progress defeats all the
>>>>>>> purpose of drm_sched_stop->kthread_park ? If
>>>>>>> drm_sched_entity_fini-> kthread_unpark happens AFTER
>>>>>>> drm_sched_stop->kthread_park nothing prevents from another (third)
>>>>>>> thread keep submitting job to HW which will be picked up by the
>>>>>>> unparked scheduler thread try to submit to HW but fail because the
>>>>>>> HW ring is deactivated.
>>>>>>>
>>>>>>> If so maybe we should serialize calls to
>>>>>>> kthread_park/unpark(sched->thread) ?
>>>>>>>
>>>>>> Yeah, that was my thinking as well. Probably best to just grab the
>>>>>> reset lock before calling drm_sched_entity_fini().
>>>>> Shirish - please try locking &adev->lock_reset around calls to
>>>>> drm_sched_entity_fini as Christian suggests and see if this actually
>>>>> helps the issue.
>>>>>
>>>> Yes that also works.
>>>>
>>>> Regards,
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list