[PATCH] drm/sched: fix the bug of time out calculation(v3)

Fri Aug 27 18:39:18 UTC 2021

Sure then.

Andrey

On 2021-08-27 2:30 p.m., Christian König wrote:
> Yeah, that's what I meant with that the start of processing a job is a 
> bit swampy defined.
>
> Jobs overload, but we simply don't have another good indicator that a 
> job started except that the previous one completed.
>
> It's still better than starting the timer when pushing the job to the 
> ring buffer, because that is completely off.
>
> Christian.
>
> Am 27.08.21 um 20:22 schrieb Andrey Grodzovsky:
>> As I mentioned to Monk before - what about cases such as in this test 
>> - 
>> https://gitlab.freedesktop.org/mesa/drm/-/commit/bc21168fa924d3fc4a000492e861f50a1a135b25 
>>
>> Here you don't have serialized sequence where when jobs finishes 
>> processing and second starts, they execute together concurrently - 
>> for those cases seems
>> to me restarting the timer for the second job from scratch will let 
>> it hang much longer then allowed by TO value.
>>
>> Andrey
>>
>> On 2021-08-27 10:29 a.m., Christian König wrote:
>>> I don't think that makes sense.
>>>
>>> See we don't want to start the time when the job is inserted into 
>>> the ring buffer, but rather when it starts processing.
>>>
>>> Starting processing is a bit swampy defined, but just starting the 
>>> timer when the previous job completes should be fine enough.
>>>
>>> Christian.
>>>
>>> Am 27.08.21 um 15:57 schrieb Andrey Grodzovsky:
>>>> The TS represents the point in time when the job was inserted into 
>>>> the pending list.
>>>> I don't think it matters when it actually starts to be processed, 
>>>> what matters is when this job was inserted into pending list 
>>>> because right at that point you arm the TO timer (when no other is 
>>>> running already)
>>>> and so when the previous job completes and you cancel and rearm 
>>>> again you can use that TS from the next job in pending list to 
>>>> calculate how much time has actually left for it to run before TDR 
>>>> must be initiated
>>>> and not just give it again full TO value to run even if it has 
>>>> already been running for a while.
>>>>
>>>> Also, i am not sure also about the assumption that what we measure 
>>>> is processing by HW, what we measure is from the moment it was 
>>>> scheduled to ring to the moment the job completed (EOP event). At 
>>>> least that what our TDR timer measures and so it makes sense to set 
>>>> the TS at this point.
>>>>
>>>> Andrey
>>>>
>>>> On 2021-08-27 3:20 a.m., Liu, Monk wrote:
>>>>> [AMD Official Use Only]
>>>>>
>>>>> what is that 'ts' representing for ? it looks to me the jiffies 
>>>>> that it get scheduled to the ring,  but a job scheduled to the 
>>>>> ring doesn't represent it's being processed by hw.
>>>>>
>>>>> Thanks
>>>>>
>>>>> ------------------------------------------
>>>>> Monk Liu | Cloud-GPU Core team
>>>>> ------------------------------------------
>>>>>
>>>>> -----Original Message-----
>>>>> From: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>
>>>>> Sent: Friday, August 27, 2021 4:14 AM
>>>>> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org; 
>>>>> Koenig, Christian <Christian.Koenig at amd.com>
>>>>> Cc: dri-devel at lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/sched: fix the bug of time out 
>>>>> calculation(v3)
>>>>>
>>>>> Attached quick patch for per job TTL calculation to make more 
>>>>> precises next timer expiration. It's on top of the patch in this 
>>>>> thread. Let me know if this makes sense.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 2021-08-26 10:03 a.m., Andrey Grodzovsky wrote:
>>>>>> On 2021-08-26 12:55 a.m., Monk Liu wrote:
>>>>>>> issue:
>>>>>>> in cleanup_job the cancle_delayed_work will cancel a TO timer even
>>>>>>> the its corresponding job is still running.
>>>>>>>
>>>>>>> fix:
>>>>>>> do not cancel the timer in cleanup_job, instead do the cancelling
>>>>>>> only when the heading job is signaled, and if there is a "next" job
>>>>>>> we start_timeout again.
>>>>>>>
>>>>>>> v2:
>>>>>>> further cleanup the logic, and do the TDR timer cancelling if the
>>>>>>> signaled job is the last one in its scheduler.
>>>>>>>
>>>>>>> v3:
>>>>>>> change the issue description
>>>>>>> remove the cancel_delayed_work in the begining of the cleanup_job
>>>>>>> recover the implement of drm_sched_job_begin.
>>>>>>>
>>>>>>> TODO:
>>>>>>> 1)introduce pause/resume scheduler in job_timeout to serial the
>>>>>>> handling of scheduler and job_timeout.
>>>>>>> 2)drop the bad job's del and insert in scheduler due to above
>>>>>>> serialization (no race issue anymore with the serialization)
>>>>>>>
>>>>>>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 25
>>>>>>> ++++++++++---------------
>>>>>>>    1 file changed, 10 insertions(+), 15 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index a2a9536..ecf8140 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -676,13 +676,7 @@ drm_sched_get_cleanup_job(struct
>>>>>>> drm_gpu_scheduler *sched)
>>>>>>>    {
>>>>>>>        struct drm_sched_job *job, *next;
>>>>>>>    -    /*
>>>>>>> -     * Don't destroy jobs while the timeout worker is running OR
>>>>>>> thread
>>>>>>> -     * is being parked and hence assumed to not touch pending_list
>>>>>>> -     */
>>>>>>> -    if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>>>> - !cancel_delayed_work(&sched->work_tdr)) ||
>>>>>>> -        kthread_should_park())
>>>>>>> +    if (kthread_should_park())
>>>>>>>            return NULL;
>>>>>>
>>>>>> I actually don't see why we need to keep the above, on the other 
>>>>>> side
>>>>>> (in drm_sched_stop) we won't touch the pending list anyway until 
>>>>>> sched
>>>>>> thread came to full stop (kthread_park). If you do see a reason why
>>>>>> this needed then a comment should be here i think.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> spin_lock(&sched->job_list_lock);
>>>>>>> @@ -693,17 +687,21 @@ drm_sched_get_cleanup_job(struct
>>>>>>> drm_gpu_scheduler *sched)
>>>>>>>        if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>>>>            /* remove job from pending_list */
>>>>>>>            list_del_init(&job->list);
>>>>>>> +
>>>>>>> +        /* cancel this job's TO timer */
>>>>>>> +        cancel_delayed_work(&sched->work_tdr);
>>>>>>>            /* make the scheduled timestamp more accurate */
>>>>>>>            next = list_first_entry_or_null(&sched->pending_list,
>>>>>>>                            typeof(*next), list);
>>>>>>> -        if (next)
>>>>>>> +
>>>>>>> +        if (next) {
>>>>>>> next->s_fence->scheduled.timestamp =
>>>>>>> job->s_fence->finished.timestamp;
>>>>>>> -
>>>>>>> +            /* start TO timer for next job */
>>>>>>> +            drm_sched_start_timeout(sched);
>>>>>>> +        }
>>>>>>>        } else {
>>>>>>>            job = NULL;
>>>>>>> -        /* queue timeout for next job */
>>>>>>> -        drm_sched_start_timeout(sched);
>>>>>>>        }
>>>>>>>          spin_unlock(&sched->job_list_lock);
>>>>>>> @@ -791,11 +789,8 @@ static int drm_sched_main(void *param)
>>>>>>>                          (entity = 
>>>>>>> drm_sched_select_entity(sched))) ||
>>>>>>>                         kthread_should_stop());
>>>>>>>    -        if (cleanup_job) {
>>>>>>> +        if (cleanup_job)
>>>>>>> sched->ops->free_job(cleanup_job);
>>>>>>> -            /* queue timeout for next job */
>>>>>>> -            drm_sched_start_timeout(sched);
>>>>>>> -        }
>>>>>>>              if (!entity)
>>>>>>>                continue;
>>>
>