[PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

Thu Jan 24 11:34:05 UTC 2019

I see a few cleanups on Patch #3 which actually belong in patch #1:

> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
> drm_sched_job *bad)
The "bad" job parameter actually isn't used any more, isn't it?

> +retry_wait:
Not used any more.

But apart from that at least patch #1 and #2 look like they can have my 
rb now.

Patch #3 looks also like it should work after a bit of polishing.

Thanks,
Christian.

Am 18.01.19 um 20:15 schrieb Grodzovsky, Andrey:
> Attached series is the first 2 patches we already discussed about ring
> mirror list handling racing with all your comments fixed (still not
> committed). The third patch is a prototype based on the first 2 patches
> and on our discussion.
>
> Please take a look.
>
> Andrey
>
>
> On 01/18/2019 01:32 PM, Koenig, Christian wrote:
>> Am 18.01.19 um 18:34 schrieb Grodzovsky, Andrey:
>>> On 01/18/2019 12:10 PM, Koenig, Christian wrote:
>>>> Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey:
>>>>> On 01/18/2019 04:25 AM, Koenig, Christian wrote:
>>>>>> [SNIP]
>>>>>>>>>> Re-arming the timeout should probably have a much reduced value
>>>>>>>>>> when the job hasn't changed. E.g. something like a few ms.
>>>>>>> Now i got thinking about non hanged job in progress (job A) and let's
>>>>>>> say it's a long job , it just started executing but due to time out of
>>>>>>> another job (job B) on another (or this scheduler) it's parent cb got
>>>>>>> disconnected, we disarmed the tdr timer for the job's scheduler,
>>>>>>> meanwhile the timed out job did manage to complete before HW reset
>>>>>>> check and hence we skip HW reset, attach back the cb and rearm job's A
>>>>>>> tdr  timer with a future value of few ms only - aren't we going to get
>>>>>>> false tdr triggered on job B now because we didn't let it enough time
>>>>>>> to run and complete ? I would prefer the other extreme of longer time
>>>>>>> for time out to trigger then false TDR. Optimally we would have per
>>>>>>> job timer and rearm to exactly the reminder of it's time out value -
>>>>>>> but we gave up on per job tdr work long ago.
>>>>>> Well we only re-arm the timeout with a shorter period if it already
>>>>>> triggered once. If we just suspend the timeout then we should still use
>>>>>> the longer period.
>>>>> Can you explain more on this ? I don't get it.
>>>> See drm_sched_job_timedout(), we re-arm the timeout at the end of the
>>>> procedure.
>>>>
>>>> We should change that and re-arm the timer with a much lower timeout if
>>>> the job is still not finished.
>>>>
>>>> Christian.
>>> I still don't see how this can fix the problem of of long job in
>>> progress triggering false tdr if no HW reset was done, but maybe I am
>>> missing other pieces you have in mind, I will finish the patch and send
>>> it and then we can be more specific based on the code.
>> Ok sounds good. We should probably discuss less on details and prototype
>> a bit more.
>>
>> Might be that I'm missing something here as well, so probably good to
>> have some code to talk about things more directly.
>>
>> Christian.
>>
>>> Andrey
>>>
>>>>> Andrey
>>>>>
>>>>>>> In general the more i think about it  (correct me if I am wrong) I am
>>>>>>> less sure how much the optimization feature is useful - if job's time
>>>>>>> out did trigger what are the chances that the little more time we give
>>>>>>> it between beginning of tdr function and the time we do start the
>>>>>>> actual HW reset will be exactly what it needed to complete. Also, this
>>>>>>> is still not water proof as the job might complete and signal it's HW
>>>>>>> fence exactly after we checked for completion but before starting the
>>>>>>> HW reset code.
>>>>>> I don't see this as an optimization, but rather as mandatory for correct
>>>>>> operation.
>>>>>>
>>>>>> See without this we can run into issues because we execute jobs multiple
>>>>>> times. That can still happen with this clean handling, but it is much
>>>>>> more unlikely.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>>> By unchanged you mean when we didn't resubmit the job because of the
>>>>>>>>> optimized non HW reset, right ?
>>>>>>>> Correct, yes.
>>>>>>>>
>>>>>>>>>>> About flushing tdr jobs in progress from .free_job cb - looks like
>>>>>>>>>>> drm_sched_job_finish->cancel_delayed_work_sync is not enough, we
>>>>>>>>>>> still need to take care of flushing all sced->work_tdr for a
>>>>>>>>>>> device and for all devices in hive for XGMI.
>>>>>>>>>>> What do you think ?
>>>>>>>>>> Why should that be necessary? We only wait for the delayed work to
>>>>>>>>>> make sure that the job is not destroyed while dealing with it.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>> But we might not be waiting for the correct sched->work_tdr, we do
>>>>>>>>> the reset routine for all schedulers in a device accessing their
>>>>>>>>> jobs too and not only for the scheduler to which the job belongs.
>>>>>>>>> For XGMI not only that, we reset all the devices in the hive.
>>>>>>>> That is harmless you only need to wait for the work_tdr of the
>>>>>>>> current scheduler, not for all of them.
>>>>>>>>
>>>>>>>>> I was thinking, amdgpu driver is not even interested in allowing
>>>>>>>>> multiple sced->tdr to execute together - we have to serialize all of
>>>>>>>>> them anyway with the trylock mutex (even without XGMI), v3d in
>>>>>>>>> v3d_job_timedout seems also to reset all of his schedulers from the
>>>>>>>>> tdr work. Would it make sense to provide the sched->work_td as init
>>>>>>>>> parameter to scheduler (same one for all schedulers) so we can
>>>>>>>>> enforce serialization by disallowing more then 1 tdr work to execute
>>>>>>>>> in the same time ? Other drivers interested to do in parallel can
>>>>>>>>> provide unique sched->work_tdr per scheduler. This does  imply
>>>>>>>>> drm_sched_job_timedout has to removed and delegated to specific
>>>>>>>>> driver implementation as probably other code dealing with
>>>>>>>>> sched->work_tdr... Maybe even move tdr handling to the driver all
>>>>>>>>> together ?
>>>>>>>> Yeah, I was thinking something similar. The problem with this
>>>>>>>> approach is that a delayed work item can have only one delay, but for
>>>>>>>> multiple engines we need multiple delays.
>>>>>>>>
>>>>>>>> What we could do is to make it a timer instead and raise the work
>>>>>>>> item from the device specific callback.
>>>>>>>>
>>>>>>>> But that doesn't really saves us the stop all schedulers trouble, so
>>>>>>>> it doesn't buy us much in the end if I see this correctly.
>>>>>>>>
>>>>>>>> Christian.
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx