<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body>
<div>What about removing </div>
<div><br>
</div>
<div>(kthread_should_park()) ? We decided it's useless as far as I remember.</div>
<div><br>
</div>
<div>Andrey</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Liu, Monk <Monk.Liu@amd.com><br>
<b>Sent:</b> 31 August 2021 20:24<br>
<b>To:</b> Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
<b>Cc:</b> dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org><br>
<b>Subject:</b> RE: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v3)</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">[AMD Official Use Only]<br>
<br>
Ping Christian, Andrey<br>
<br>
Can we merge this patch first ? this is a standalone patch for the timer <br>
<br>
Thanks <br>
<br>
------------------------------------------<br>
Monk Liu | Cloud-GPU Core team<br>
------------------------------------------<br>
<br>
-----Original Message-----<br>
From: Monk Liu <Monk.Liu@amd.com> <br>
Sent: Tuesday, August 31, 2021 6:36 PM<br>
To: amd-gfx@lists.freedesktop.org<br>
Cc: dri-devel@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com><br>
Subject: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v3)<br>
<br>
issue:<br>
in cleanup_job the cancle_delayed_work will cancel a TO timer even the its corresponding job is still running.<br>
<br>
fix:<br>
do not cancel the timer in cleanup_job, instead do the cancelling only when the heading job is signaled, and if there is a "next" job we start_timeout again.<br>
<br>
v2:<br>
further cleanup the logic, and do the TDR timer cancelling if the signaled job is the last one in its scheduler.<br>
<br>
v3:<br>
change the issue description<br>
remove the cancel_delayed_work in the begining of the cleanup_job recover the implement of drm_sched_job_begin.<br>
<br>
TODO:<br>
1)introduce pause/resume scheduler in job_timeout to serial the handling of scheduler and job_timeout.<br>
2)drop the bad job's del and insert in scheduler due to above serialization (no race issue anymore with the serialization)<br>
<br>
tested-by: jingwen <jingwen.chen@@amd.com><br>
Signed-off-by: Monk Liu <Monk.Liu@amd.com><br>
---<br>
 drivers/gpu/drm/scheduler/sched_main.c | 25 ++++++++++---------------<br>
 1 file changed, 10 insertions(+), 15 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c<br>
index a2a9536..ecf8140 100644<br>
--- a/drivers/gpu/drm/scheduler/sched_main.c<br>
+++ b/drivers/gpu/drm/scheduler/sched_main.c<br>
@@ -676,13 +676,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)  {<br>
         struct drm_sched_job *job, *next;<br>
 <br>
-       /*<br>
-        * Don't destroy jobs while the timeout worker is running  OR thread<br>
-        * is being parked and hence assumed to not touch pending_list<br>
-        */<br>
-       if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&<br>
-           !cancel_delayed_work(&sched->work_tdr)) ||<br>
-           kthread_should_park())<br>
+       if (kthread_should_park())<br>
                 return NULL;<br>
 <br>
         spin_lock(&sched->job_list_lock);<br>
@@ -693,17 +687,21 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)<br>
         if (job && dma_fence_is_signaled(&job->s_fence->finished)) {<br>
                 /* remove job from pending_list */<br>
                 list_del_init(&job->list);<br>
+<br>
+               /* cancel this job's TO timer */<br>
+               cancel_delayed_work(&sched->work_tdr);<br>
                 /* make the scheduled timestamp more accurate */<br>
                 next = list_first_entry_or_null(&sched->pending_list,<br>
                                                 typeof(*next), list);<br>
-               if (next)<br>
+<br>
+               if (next) {<br>
                         next->s_fence->scheduled.timestamp =<br>
                                 job->s_fence->finished.timestamp;<br>
-<br>
+                       /* start TO timer for next job */<br>
+                       drm_sched_start_timeout(sched);<br>
+               }<br>
         } else {<br>
                 job = NULL;<br>
-               /* queue timeout for next job */<br>
-               drm_sched_start_timeout(sched);<br>
         }<br>
 <br>
         spin_unlock(&sched->job_list_lock);<br>
@@ -791,11 +789,8 @@ static int drm_sched_main(void *param)<br>
                                           (entity = drm_sched_select_entity(sched))) ||<br>
                                          kthread_should_stop());<br>
 <br>
-               if (cleanup_job) {<br>
+               if (cleanup_job)<br>
                         sched->ops->free_job(cleanup_job);<br>
-                       /* queue timeout for next job */<br>
-                       drm_sched_start_timeout(sched);<br>
-               }<br>
 <br>
                 if (!entity)<br>
                         continue;<br>
--<br>
2.7.4<br>
</div>
</span></font></div>
</body>
</html>