[PATCH] drm/sched: fix the bug of time out calculation(v3)

Fri Aug 27 18:22:39 UTC 2021

As I mentioned to Monk before - what about cases such as in this test - 
https://gitlab.freedesktop.org/mesa/drm/-/commit/bc21168fa924d3fc4a000492e861f50a1a135b25
Here you don't have serialized sequence where when jobs finishes 
processing and second starts, they execute together  concurrently - for 
those cases seems
to me restarting the timer for the second job from scratch will let it 
hang much longer then allowed by TO value.

Andrey

On 2021-08-27 10:29 a.m., Christian König wrote:
> I don't think that makes sense.
>
> See we don't want to start the time when the job is inserted into the 
> ring buffer, but rather when it starts processing.
>
> Starting processing is a bit swampy defined, but just starting the 
> timer when the previous job completes should be fine enough.
>
> Christian.
>
> Am 27.08.21 um 15:57 schrieb Andrey Grodzovsky:
>> The TS represents the point in time when the job was inserted into 
>> the pending list.
>> I don't think it matters when it actually starts to be processed, 
>> what matters is when this job was inserted into pending list because 
>> right at that point you arm the TO timer (when no other is running 
>> already)
>> and so when the previous job completes and you cancel and rearm again 
>> you can use that TS from the next job in pending list to calculate 
>> how much time has actually left for it to run before TDR must be 
>> initiated
>> and not just give it again full TO value to run even if it has 
>> already been running for a while.
>>
>> Also, i am not sure also about the assumption that what we measure is 
>> processing by HW, what we measure is from the moment it was scheduled 
>> to ring to the moment the job completed (EOP event). At least that 
>> what our TDR timer measures and so it makes sense to set the TS at 
>> this point.
>>
>> Andrey
>>
>> On 2021-08-27 3:20 a.m., Liu, Monk wrote:
>>> [AMD Official Use Only]
>>>
>>> what is that 'ts' representing for ? it looks to me the jiffies that 
>>> it get scheduled to the ring,  but a job scheduled to the ring 
>>> doesn't represent it's being processed by hw.
>>>
>>> Thanks
>>>
>>> ------------------------------------------
>>> Monk Liu | Cloud-GPU Core team
>>> ------------------------------------------
>>>
>>> -----Original Message-----
>>> From: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>
>>> Sent: Friday, August 27, 2021 4:14 AM
>>> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org; 
>>> Koenig, Christian <Christian.Koenig at amd.com>
>>> Cc: dri-devel at lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/sched: fix the bug of time out calculation(v3)
>>>
>>> Attached quick patch for per job TTL calculation to make more 
>>> precises next timer expiration. It's on top of the patch in this 
>>> thread. Let me know if this makes sense.
>>>
>>> Andrey
>>>
>>> On 2021-08-26 10:03 a.m., Andrey Grodzovsky wrote:
>>>> On 2021-08-26 12:55 a.m., Monk Liu wrote:
>>>>> issue:
>>>>> in cleanup_job the cancle_delayed_work will cancel a TO timer even
>>>>> the its corresponding job is still running.
>>>>>
>>>>> fix:
>>>>> do not cancel the timer in cleanup_job, instead do the cancelling
>>>>> only when the heading job is signaled, and if there is a "next" job
>>>>> we start_timeout again.
>>>>>
>>>>> v2:
>>>>> further cleanup the logic, and do the TDR timer cancelling if the
>>>>> signaled job is the last one in its scheduler.
>>>>>
>>>>> v3:
>>>>> change the issue description
>>>>> remove the cancel_delayed_work in the begining of the cleanup_job
>>>>> recover the implement of drm_sched_job_begin.
>>>>>
>>>>> TODO:
>>>>> 1)introduce pause/resume scheduler in job_timeout to serial the
>>>>> handling of scheduler and job_timeout.
>>>>> 2)drop the bad job's del and insert in scheduler due to above
>>>>> serialization (no race issue anymore with the serialization)
>>>>>
>>>>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 25
>>>>> ++++++++++---------------
>>>>>    1 file changed, 10 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index a2a9536..ecf8140 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -676,13 +676,7 @@ drm_sched_get_cleanup_job(struct
>>>>> drm_gpu_scheduler *sched)
>>>>>    {
>>>>>        struct drm_sched_job *job, *next;
>>>>>    -    /*
>>>>> -     * Don't destroy jobs while the timeout worker is running OR
>>>>> thread
>>>>> -     * is being parked and hence assumed to not touch pending_list
>>>>> -     */
>>>>> -    if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>> -        !cancel_delayed_work(&sched->work_tdr)) ||
>>>>> -        kthread_should_park())
>>>>> +    if (kthread_should_park())
>>>>>            return NULL;
>>>>
>>>> I actually don't see why we need to keep the above, on the other side
>>>> (in drm_sched_stop) we won't touch the pending list anyway until sched
>>>> thread came to full stop (kthread_park). If you do see a reason why
>>>> this needed then a comment should be here i think.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>> spin_lock(&sched->job_list_lock);
>>>>> @@ -693,17 +687,21 @@ drm_sched_get_cleanup_job(struct
>>>>> drm_gpu_scheduler *sched)
>>>>>        if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>>            /* remove job from pending_list */
>>>>>            list_del_init(&job->list);
>>>>> +
>>>>> +        /* cancel this job's TO timer */
>>>>> +        cancel_delayed_work(&sched->work_tdr);
>>>>>            /* make the scheduled timestamp more accurate */
>>>>>            next = list_first_entry_or_null(&sched->pending_list,
>>>>>                            typeof(*next), list);
>>>>> -        if (next)
>>>>> +
>>>>> +        if (next) {
>>>>>                next->s_fence->scheduled.timestamp =
>>>>>                    job->s_fence->finished.timestamp;
>>>>> -
>>>>> +            /* start TO timer for next job */
>>>>> +            drm_sched_start_timeout(sched);
>>>>> +        }
>>>>>        } else {
>>>>>            job = NULL;
>>>>> -        /* queue timeout for next job */
>>>>> -        drm_sched_start_timeout(sched);
>>>>>        }
>>>>>          spin_unlock(&sched->job_list_lock);
>>>>> @@ -791,11 +789,8 @@ static int drm_sched_main(void *param)
>>>>>                          (entity = 
>>>>> drm_sched_select_entity(sched))) ||
>>>>>                         kthread_should_stop());
>>>>>    -        if (cleanup_job) {
>>>>> +        if (cleanup_job)
>>>>>                sched->ops->free_job(cleanup_job);
>>>>> -            /* queue timeout for next job */
>>>>> -            drm_sched_start_timeout(sched);
>>>>> -        }
>>>>>              if (!entity)
>>>>>                continue;
>