[PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

Fri Jan 18 19:15:41 UTC 2019

Attached series is the first 2 patches we already discussed about ring 
mirror list handling racing with all your comments fixed (still not 
committed). The third patch is a prototype based on the first 2 patches 
and on our discussion.

Please take a look.

Andrey

On 01/18/2019 01:32 PM, Koenig, Christian wrote:
> Am 18.01.19 um 18:34 schrieb Grodzovsky, Andrey:
>> On 01/18/2019 12:10 PM, Koenig, Christian wrote:
>>> Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey:
>>>> On 01/18/2019 04:25 AM, Koenig, Christian wrote:
>>>>> [SNIP]
>>>>>>>>> Re-arming the timeout should probably have a much reduced value
>>>>>>>>> when the job hasn't changed. E.g. something like a few ms.
>>>>>> Now i got thinking about non hanged job in progress (job A) and let's
>>>>>> say it's a long job , it just started executing but due to time out of
>>>>>> another job (job B) on another (or this scheduler) it's parent cb got
>>>>>> disconnected, we disarmed the tdr timer for the job's scheduler,
>>>>>> meanwhile the timed out job did manage to complete before HW reset
>>>>>> check and hence we skip HW reset, attach back the cb and rearm job's A
>>>>>> tdr  timer with a future value of few ms only - aren't we going to get
>>>>>> false tdr triggered on job B now because we didn't let it enough time
>>>>>> to run and complete ? I would prefer the other extreme of longer time
>>>>>> for time out to trigger then false TDR. Optimally we would have per
>>>>>> job timer and rearm to exactly the reminder of it's time out value -
>>>>>> but we gave up on per job tdr work long ago.
>>>>> Well we only re-arm the timeout with a shorter period if it already
>>>>> triggered once. If we just suspend the timeout then we should still use
>>>>> the longer period.
>>>> Can you explain more on this ? I don't get it.
>>> See drm_sched_job_timedout(), we re-arm the timeout at the end of the
>>> procedure.
>>>
>>> We should change that and re-arm the timer with a much lower timeout if
>>> the job is still not finished.
>>>
>>> Christian.
>> I still don't see how this can fix the problem of of long job in
>> progress triggering false tdr if no HW reset was done, but maybe I am
>> missing other pieces you have in mind, I will finish the patch and send
>> it and then we can be more specific based on the code.
> Ok sounds good. We should probably discuss less on details and prototype
> a bit more.
>
> Might be that I'm missing something here as well, so probably good to
> have some code to talk about things more directly.
>
> Christian.
>
>> Andrey
>>
>>>> Andrey
>>>>
>>>>>> In general the more i think about it  (correct me if I am wrong) I am
>>>>>> less sure how much the optimization feature is useful - if job's time
>>>>>> out did trigger what are the chances that the little more time we give
>>>>>> it between beginning of tdr function and the time we do start the
>>>>>> actual HW reset will be exactly what it needed to complete. Also, this
>>>>>> is still not water proof as the job might complete and signal it's HW
>>>>>> fence exactly after we checked for completion but before starting the
>>>>>> HW reset code.
>>>>> I don't see this as an optimization, but rather as mandatory for correct
>>>>> operation.
>>>>>
>>>>> See without this we can run into issues because we execute jobs multiple
>>>>> times. That can still happen with this clean handling, but it is much
>>>>> more unlikely.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>>> By unchanged you mean when we didn't resubmit the job because of the
>>>>>>>> optimized non HW reset, right ?
>>>>>>> Correct, yes.
>>>>>>>
>>>>>>>>>> About flushing tdr jobs in progress from .free_job cb - looks like
>>>>>>>>>> drm_sched_job_finish->cancel_delayed_work_sync is not enough, we
>>>>>>>>>> still need to take care of flushing all sced->work_tdr for a
>>>>>>>>>> device and for all devices in hive for XGMI.
>>>>>>>>>> What do you think ?
>>>>>>>>> Why should that be necessary? We only wait for the delayed work to
>>>>>>>>> make sure that the job is not destroyed while dealing with it.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>> But we might not be waiting for the correct sched->work_tdr, we do
>>>>>>>> the reset routine for all schedulers in a device accessing their
>>>>>>>> jobs too and not only for the scheduler to which the job belongs.
>>>>>>>> For XGMI not only that, we reset all the devices in the hive.
>>>>>>> That is harmless you only need to wait for the work_tdr of the
>>>>>>> current scheduler, not for all of them.
>>>>>>>
>>>>>>>> I was thinking, amdgpu driver is not even interested in allowing
>>>>>>>> multiple sced->tdr to execute together - we have to serialize all of
>>>>>>>> them anyway with the trylock mutex (even without XGMI), v3d in
>>>>>>>> v3d_job_timedout seems also to reset all of his schedulers from the
>>>>>>>> tdr work. Would it make sense to provide the sched->work_td as init
>>>>>>>> parameter to scheduler (same one for all schedulers) so we can
>>>>>>>> enforce serialization by disallowing more then 1 tdr work to execute
>>>>>>>> in the same time ? Other drivers interested to do in parallel can
>>>>>>>> provide unique sched->work_tdr per scheduler. This does  imply
>>>>>>>> drm_sched_job_timedout has to removed and delegated to specific
>>>>>>>> driver implementation as probably other code dealing with
>>>>>>>> sched->work_tdr... Maybe even move tdr handling to the driver all
>>>>>>>> together ?
>>>>>>> Yeah, I was thinking something similar. The problem with this
>>>>>>> approach is that a delayed work item can have only one delay, but for
>>>>>>> multiple engines we need multiple delays.
>>>>>>>
>>>>>>> What we could do is to make it a timer instead and raise the work
>>>>>>> item from the device specific callback.
>>>>>>>
>>>>>>> But that doesn't really saves us the stop all schedulers trouble, so
>>>>>>> it doesn't buy us much in the end if I see this correctly.
>>>>>>>
>>>>>>> Christian.
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-sched-Refactor-ring-mirror-list-handling.patch
Type: text/x-patch
Size: 14320 bytes
Desc: 0001-drm-sched-Refactor-ring-mirror-list-handling.patch
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190118/db2cc69b/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-drm-sched-Rework-HW-fence-processing.patch
Type: text/x-patch
Size: 7163 bytes
Desc: 0002-drm-sched-Rework-HW-fence-processing.patch
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190118/db2cc69b/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-drm-sched-DRAFT-drm-sched-Skip-HW-reset-if-guilty-jo.patch
Type: text/x-patch
Size: 13533 bytes
Desc: 0003-drm-sched-DRAFT-drm-sched-Skip-HW-reset-if-guilty-jo.patch
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190118/db2cc69b/attachment-0005.bin>