[PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

Fri Jan 18 09:25:28 UTC 2019

[SNIP]
>>>> Re-arming the timeout should probably have a much reduced value 
>>>> when the job hasn't changed. E.g. something like a few ms.
>
> Now i got thinking about non hanged job in progress (job A) and let's 
> say it's a long job , it just started executing but due to time out of 
> another job (job B) on another (or this scheduler) it's parent cb got 
> disconnected, we disarmed the tdr timer for the job's scheduler, 
> meanwhile the timed out job did manage to complete before HW reset 
> check and hence we skip HW reset, attach back the cb and rearm job's A 
> tdr  timer with a future value of few ms only - aren't we going to get 
> false tdr triggered on job B now because we didn't let it enough time 
> to run and complete ? I would prefer the other extreme of longer time 
> for time out to trigger then false TDR. Optimally we would have per 
> job timer and rearm to exactly the reminder of it's time out value - 
> but we gave up on per job tdr work long ago.

Well we only re-arm the timeout with a shorter period if it already 
triggered once. If we just suspend the timeout then we should still use 
the longer period.

> In general the more i think about it  (correct me if I am wrong) I am 
> less sure how much the optimization feature is useful - if job's time 
> out did trigger what are the chances that the little more time we give 
> it between beginning of tdr function and the time we do start the 
> actual HW reset will be exactly what it needed to complete. Also, this 
> is still not water proof as the job might complete and signal it's HW 
> fence exactly after we checked for completion but before starting the 
> HW reset code.

I don't see this as an optimization, but rather as mandatory for correct 
operation.

See without this we can run into issues because we execute jobs multiple 
times. That can still happen with this clean handling, but it is much 
more unlikely.

Christian.

>
> Andrey
>
>>>
>>> By unchanged you mean when we didn't resubmit the job because of the 
>>> optimized non HW reset, right ?
>>
>> Correct, yes.
>>
>>>
>>>>
>>>>> About flushing tdr jobs in progress from .free_job cb - looks like 
>>>>> drm_sched_job_finish->cancel_delayed_work_sync is not enough, we 
>>>>> still need to take care of flushing all sced->work_tdr for a 
>>>>> device and for all devices in hive for XGMI.
>>>>> What do you think ?
>>>>
>>>> Why should that be necessary? We only wait for the delayed work to 
>>>> make sure that the job is not destroyed while dealing with it.
>>>>
>>>> Christian.
>>>
>>> But we might not be waiting for the correct sched->work_tdr, we do 
>>> the reset routine for all schedulers in a device accessing their 
>>> jobs too and not only for the scheduler to which the job belongs. 
>>> For XGMI not only that, we reset all the devices in the hive.
>>
>> That is harmless you only need to wait for the work_tdr of the 
>> current scheduler, not for all of them.
>>
>>>
>>> I was thinking, amdgpu driver is not even interested in allowing 
>>> multiple sced->tdr to execute together - we have to serialize all of 
>>> them anyway with the trylock mutex (even without XGMI), v3d in 
>>> v3d_job_timedout seems also to reset all of his schedulers from the 
>>> tdr work. Would it make sense to provide the sched->work_td as init 
>>> parameter to scheduler (same one for all schedulers) so we can 
>>> enforce serialization by disallowing more then 1 tdr work to execute 
>>> in the same time ? Other drivers interested to do in parallel can 
>>> provide unique sched->work_tdr per scheduler. This does  imply 
>>> drm_sched_job_timedout has to removed and delegated to specific 
>>> driver implementation as probably other code dealing with 
>>> sched->work_tdr... Maybe even move tdr handling to the driver all 
>>> together ?
>>
>> Yeah, I was thinking something similar. The problem with this 
>> approach is that a delayed work item can have only one delay, but for 
>> multiple engines we need multiple delays.
>>
>> What we could do is to make it a timer instead and raise the work 
>> item from the device specific callback.
>>
>> But that doesn't really saves us the stop all schedulers trouble, so 
>> it doesn't buy us much in the end if I see this correctly.
>>
>> Christian.