[PATCH 2/2] drm/sched: serialize job_timeout and scheduler

Wed Sep 1 00:52:51 UTC 2021

[AMD Official Use Only]

>> This is a __ function, i.e. considered internal, and it's lockless atomic, i.e. unordered. And you're not explaining why this works.

It's not a traditional habit from what I can see that put explain in code, but we can do that in mails ,
We want to park the scheduler in job_timeout to serialize the job accessing from both sched and TO handler , but inside vendor's callback timeout_job at least both panfrost and amd 
They both call drm_sched_stop() on all schedulers.  

If we unconditionally call "kthread_park" in job_timedout  then the bailing job's timedout will try to call "kthread_park" again on its scheduler and introduce "warning"

The scenario is :
1,Job1 on sched1 triggers timedout, and sched1 is parked,
2,vendor callback runs, it will usually stop all schedulers.
3,Job2 on sched2 triggers timedout, so the job_timedout also try to park sched2, but sched2 was stopped already by above step.  (job2's timeout is introduced by job1, or by other VF)
          ---So there will be "warning" in kernel log from above step... after this "__kthread_should_park()" here we can avoid the warning, that's the only reason I need this __function.
4,Before vendor callback exit, it will unpark all schedulers.

>From another hand if we don't do the kthread_park() and still delete the job here (drop deleting/reinserting the job from pending_list  is what we want), we still have a small windows to hit the race issue: 
That cleanup_job from sched thread is freeing the job while job is under processing by job_timedout or vendor's call back.

And the reason we want to avoid deleting/reinserting the timedout job is because we (amd vendor) have our own way to do a diagnostic on all jobs in pending list from all scheduler, we want to cherry-pick the real bad job 
>From all scheduler's pending list that causes this JOB TIMEOUT.

Besides, it is also much reasonable to park scheduler when job_timedout is running there, they should exclusively access those common members like sched_job. (due to spin_lock is off before running into vendor's calback)

Hope I explained ourselves well enough.

Thanks 

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------

-----Original Message-----
From: Daniel Vetter <daniel at ffwll.ch> 
Sent: Tuesday, August 31, 2021 8:59 PM
To: Liu, Monk <Monk.Liu at amd.com>
Cc: amd-gfx at lists.freedesktop.org; dri-devel at lists.freedesktop.org; Chen, Jingwen <Jingwen.Chen at amd.com>
Subject: Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

Can we please have some actual commit message here, with detailed explanation of the race/bug/whatever, how you fix it and why this is the best option?

On Tue, Aug 31, 2021 at 06:35:39PM +0800, Monk Liu wrote:
> tested-by: jingwen chen <jingwen.chen at amd.com>
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> Signed-off-by: jingwen chen <jingwen.chen at amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 24 ++++--------------------
>  1 file changed, 4 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index ecf8140..894fdb24 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -319,19 +319,17 @@ static void drm_sched_job_timedout(struct work_struct *work)
>  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>  
>  	/* Protects against concurrent deletion in drm_sched_get_cleanup_job 
> */
> +	if (!__kthread_should_park(sched->thread))

This is a __ function, i.e. considered internal, and it's lockless atomic, i.e. unordered. And you're not explaining why this works.

Iow it's probably buggy, and an just unconditionally parking the kthread is probably the right thing to do. If it's not the right thing to do, there's a bug here for sure.
-Daniel

> +		kthread_park(sched->thread);
> +
>  	spin_lock(&sched->job_list_lock);
>  	job = list_first_entry_or_null(&sched->pending_list,
>  				       struct drm_sched_job, list);
>  
>  	if (job) {
> -		/*
> -		 * Remove the bad job so it cannot be freed by concurrent
> -		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
> -		 * is parked at which point it's safe.
> -		 */
> -		list_del_init(&job->list);
>  		spin_unlock(&sched->job_list_lock);
>  
> +		/* vendor's timeout_job should call drm_sched_start() */
>  		status = job->sched->ops->timedout_job(job);
>  
>  		/*
> @@ -393,20 +391,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>  	kthread_park(sched->thread);
>  
>  	/*
> -	 * Reinsert back the bad job here - now it's safe as
> -	 * drm_sched_get_cleanup_job cannot race against us and release the
> -	 * bad job at this point - we parked (waited for) any in progress
> -	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> -	 * now until the scheduler thread is unparked.
> -	 */
> -	if (bad && bad->sched == sched)
> -		/*
> -		 * Add at the head of the queue to reflect it was the earliest
> -		 * job extracted.
> -		 */
> -		list_add(&bad->list, &sched->pending_list);
> -
> -	/*
>  	 * Iterate the job list from later to  earlier one and either deactive
>  	 * their HW callbacks or remove them from pending list if they already
>  	 * signaled.
> --
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7CMonk.Liu%40amd.com%7C4af6e233f48348677d5f08d96c7f1db2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637660115493853883%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RJsFMBCIveO5pJvhkEdV6CBbP4VRiJKqb62Py8U44tw%3D&reserved=0