[PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

Fri Jan 18 17:10:49 UTC 2019

Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey:
>
> On 01/18/2019 04:25 AM, Koenig, Christian wrote:
>> [SNIP]
>>>>>> Re-arming the timeout should probably have a much reduced value
>>>>>> when the job hasn't changed. E.g. something like a few ms.
>>> Now i got thinking about non hanged job in progress (job A) and let's
>>> say it's a long job , it just started executing but due to time out of
>>> another job (job B) on another (or this scheduler) it's parent cb got
>>> disconnected, we disarmed the tdr timer for the job's scheduler,
>>> meanwhile the timed out job did manage to complete before HW reset
>>> check and hence we skip HW reset, attach back the cb and rearm job's A
>>> tdr  timer with a future value of few ms only - aren't we going to get
>>> false tdr triggered on job B now because we didn't let it enough time
>>> to run and complete ? I would prefer the other extreme of longer time
>>> for time out to trigger then false TDR. Optimally we would have per
>>> job timer and rearm to exactly the reminder of it's time out value -
>>> but we gave up on per job tdr work long ago.
>> Well we only re-arm the timeout with a shorter period if it already
>> triggered once. If we just suspend the timeout then we should still use
>> the longer period.
> Can you explain more on this ? I don't get it.

See drm_sched_job_timedout(), we re-arm the timeout at the end of the 
procedure.

We should change that and re-arm the timer with a much lower timeout if 
the job is still not finished.

Christian.

>
> Andrey
>
>>> In general the more i think about it  (correct me if I am wrong) I am
>>> less sure how much the optimization feature is useful - if job's time
>>> out did trigger what are the chances that the little more time we give
>>> it between beginning of tdr function and the time we do start the
>>> actual HW reset will be exactly what it needed to complete. Also, this
>>> is still not water proof as the job might complete and signal it's HW
>>> fence exactly after we checked for completion but before starting the
>>> HW reset code.
>> I don't see this as an optimization, but rather as mandatory for correct
>> operation.
>>
>> See without this we can run into issues because we execute jobs multiple
>> times. That can still happen with this clean handling, but it is much
>> more unlikely.
>>
>> Christian.
>>
>>> Andrey
>>>
>>>>> By unchanged you mean when we didn't resubmit the job because of the
>>>>> optimized non HW reset, right ?
>>>> Correct, yes.
>>>>
>>>>>>> About flushing tdr jobs in progress from .free_job cb - looks like
>>>>>>> drm_sched_job_finish->cancel_delayed_work_sync is not enough, we
>>>>>>> still need to take care of flushing all sced->work_tdr for a
>>>>>>> device and for all devices in hive for XGMI.
>>>>>>> What do you think ?
>>>>>> Why should that be necessary? We only wait for the delayed work to
>>>>>> make sure that the job is not destroyed while dealing with it.
>>>>>>
>>>>>> Christian.
>>>>> But we might not be waiting for the correct sched->work_tdr, we do
>>>>> the reset routine for all schedulers in a device accessing their
>>>>> jobs too and not only for the scheduler to which the job belongs.
>>>>> For XGMI not only that, we reset all the devices in the hive.
>>>> That is harmless you only need to wait for the work_tdr of the
>>>> current scheduler, not for all of them.
>>>>
>>>>> I was thinking, amdgpu driver is not even interested in allowing
>>>>> multiple sced->tdr to execute together - we have to serialize all of
>>>>> them anyway with the trylock mutex (even without XGMI), v3d in
>>>>> v3d_job_timedout seems also to reset all of his schedulers from the
>>>>> tdr work. Would it make sense to provide the sched->work_td as init
>>>>> parameter to scheduler (same one for all schedulers) so we can
>>>>> enforce serialization by disallowing more then 1 tdr work to execute
>>>>> in the same time ? Other drivers interested to do in parallel can
>>>>> provide unique sched->work_tdr per scheduler. This does  imply
>>>>> drm_sched_job_timedout has to removed and delegated to specific
>>>>> driver implementation as probably other code dealing with
>>>>> sched->work_tdr... Maybe even move tdr handling to the driver all
>>>>> together ?
>>>> Yeah, I was thinking something similar. The problem with this
>>>> approach is that a delayed work item can have only one delay, but for
>>>> multiple engines we need multiple delays.
>>>>
>>>> What we could do is to make it a timer instead and raise the work
>>>> item from the device specific callback.
>>>>
>>>> But that doesn't really saves us the stop all schedulers trouble, so
>>>> it doesn't buy us much in the end if I see this correctly.
>>>>
>>>> Christian.
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx