[PATCH] drm/sched: fix the bug of time out calculation(v2)

Thu Aug 26 01:53:44 UTC 2021

[AMD Official Use Only]

>> All we need to take care of is to cancel_delayed_work() when we know that the job is completed.

That's why I want to remove the cancel_delayed_work in the beginning of cleanup_job(), because in that moment we don't know if
There is a job completed (sched could be wake up due to new submit, instead of a job signaled) , until we get the job and acknowledged of its signaling.

static struct drm_sched_job *
drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
{
	struct drm_sched_job *job, *next;

	/*
	 * Don't destroy jobs while the timeout worker is running  OR thread
	 * is being parked and hence assumed to not touch pending_list
	 */
	if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
	    !cancel_delayed_work(&sched->work_tdr)) || //normally if the job is not TO, then he cancel here is incorrect if the job is still running , 
	    kthread_should_park())
		return NULL;

	spin_lock(&sched->job_list_lock);

	job = list_first_entry_or_null(&sched->pending_list,
				       struct drm_sched_job, list);

	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
		/* remove job from pending_list */
		list_del_init(&job->list);
		/* make the scheduled timestamp more accurate */
		next = list_first_entry_or_null(&sched->pending_list,
						typeof(*next), list);
		if (next)
			next->s_fence->scheduled.timestamp =
				job->s_fence->finished.timestamp;

	} else {
		job = NULL;
		/* queue timeout for next job */
		drm_sched_start_timeout(sched); //if the job is not signaled, the timer will be retriggered here (counting is restarted ....) , which is wrong .... 
	}

	spin_unlock(&sched->job_list_lock);

	return job;
}

>> This here works as intended as far as I can see and if you start to use mod_delayed_work() you actually break it.
Only in the place we find heading job is signaled and there is a next job is the moment that we should cancel the work_tdr for this scheduler , of cause with 
A new work_tdr queued as the "next" job is already started on HW... that's why I use mod_delayed_work. But I can change it to "cancel and queue" approach if you have concern.

Thanks 

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com> 
Sent: Wednesday, August 25, 2021 8:11 PM
To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/sched: fix the bug of time out calculation(v2)

No, this would break that logic here.

See drm_sched_start_timeout() can be called multiple times, this is intentional and very important!

The logic in queue_delayed_work() makes sure that the timer is only started once and then never again.

All we need to take care of is to cancel_delayed_work() when we know that the job is completed.

This here works as intended as far as I can see and if you start to use
mod_delayed_work() you actually break it.

Regards,
Christian.

Am 25.08.21 um 14:01 schrieb Liu, Monk:
> [AMD Official Use Only]
>
> I think we should remove the cancel_delayed_work() in the beginning of the cleanup_job().
>
> Because by my patch the "mode_delayed_work" in cleanup_job is already 
> doing its duty to retrigger the TO timer accordingly
>
> Thanks
>
> ------------------------------------------
> Monk Liu | Cloud-GPU Core team
> ------------------------------------------
>
> -----Original Message-----
> From: Liu, Monk
> Sent: Wednesday, August 25, 2021 7:55 PM
> To: 'Christian König' <ckoenig.leichtzumerken at gmail.com>; 
> amd-gfx at lists.freedesktop.org
> Subject: RE: [PATCH] drm/sched: fix the bug of time out 
> calculation(v2)
>
> [AMD Official Use Only]
>
>>> The timeout started by queue_delayed_work() in drm_sched_start_timeout() is paired with the cancel_delayed_work() in drm_sched_get_cleanup_job().
> No that's wrong, see that when we are in cleanup_job(), assume we do not have timeout on this sched (we are just keep submitting new jobs to this sched), Then the work_tdr is cancelled, and then we get the heading job, and let's assume the job is not signaled, then we run to the "queue timeout for next job" thus drm_sched_start_timeout() is called, so this heading job's TO timer is actually retriggered ... which is totally wrong.
>
> With my patch the timer is already retriggered after previous JOB really signaled.
>
> Can you be more specific on the incorrect part ?
>
> Thanks
> ------------------------------------------
> Monk Liu | Cloud-GPU Core team
> ------------------------------------------
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> Sent: Wednesday, August 25, 2021 2:32 PM
> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/sched: fix the bug of time out 
> calculation(v2)
>
> Well NAK to that approach. First of all your bug analyses is incorrect.
>
> The timeout started by queue_delayed_work() in drm_sched_start_timeout() is paired with the cancel_delayed_work() in drm_sched_get_cleanup_job().
>
> So you must have something else going on here.
>
> Then please don't use mod_delayed_work(), instead always cancel it and restart it.
>
> Regards,
> Christian.
>
> Am 25.08.21 um 06:14 schrieb Monk Liu:
>> the original logic is wrong that the timeout will not be retriggerd 
>> after the previous job siganled, and that lead to the scenario that 
>> all jobs in the same scheduler shares the same timeout timer from the 
>> very begining job in this scheduler which is wrong.
>>
>> we should modify the timer everytime a previous job signaled.
>>
>> v2:
>> further cleanup the logic, and do the TDR timer cancelling if the 
>> signaled job is the last one in its scheduler.
>>
>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>> ---
>>    drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++++---------
>>    1 file changed, 20 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index a2a9536..8c102ac 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -305,8 +305,17 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>    	struct drm_gpu_scheduler *sched = s_job->sched;
>>    
>>    	spin_lock(&sched->job_list_lock);
>> -	list_add_tail(&s_job->list, &sched->pending_list);
>> -	drm_sched_start_timeout(sched);
>> +	if (list_empty(&sched->pending_list)) {
>> +		list_add_tail(&s_job->list, &sched->pending_list);
>> +		drm_sched_start_timeout(sched);
>> +	} else {
>> +		/* the old jobs in pending list are not finished yet
>> +		 * no need to restart TDR timer here, it is already
>> +		 * handled by drm_sched_get_cleanup_job
>> +		 */
>> +		list_add_tail(&s_job->list, &sched->pending_list);
>> +	}
>> +
>>    	spin_unlock(&sched->job_list_lock);
>>    }
>>    
>> @@ -693,17 +702,22 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>    	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>    		/* remove job from pending_list */
>>    		list_del_init(&job->list);
>> +
>>    		/* make the scheduled timestamp more accurate */
>>    		next = list_first_entry_or_null(&sched->pending_list,
>>    						typeof(*next), list);
>> -		if (next)
>> +		if (next) {
>> +			/* if we still have job in pending list we need modify the TDR timer */
>> +			mod_delayed_work(system_wq, &sched->work_tdr, sched->timeout);
>>    			next->s_fence->scheduled.timestamp =
>>    				job->s_fence->finished.timestamp;
>> +		} else {
>> +			/* cancel the TDR timer if no job in pending list */
>> +			cancel_delayed_work(&sched->work_tdr);
>> +		}
>>    
>>    	} else {
>>    		job = NULL;
>> -		/* queue timeout for next job */
>> -		drm_sched_start_timeout(sched);
>>    	}
>>    
>>    	spin_unlock(&sched->job_list_lock);
>> @@ -791,11 +805,8 @@ static int drm_sched_main(void *param)
>>    					  (entity = drm_sched_select_entity(sched))) ||
>>    					 kthread_should_stop());
>>    
>> -		if (cleanup_job) {
>> +		if (cleanup_job)
>>    			sched->ops->free_job(cleanup_job);
>> -			/* queue timeout for next job */
>> -			drm_sched_start_timeout(sched);
>> -		}
>>    
>>    		if (!entity)
>>    			continue;