<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - AMD Internal Distribution Only]<br>
</p>
<br>
<div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
Hi Christian,</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
Thanks for your suggestion, I modified the patch as:</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
index e671aa241720..66f2a43c58fd 100644</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
--- a/drivers/gpu/drm/scheduler/sched_entity.c</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
+++ b/drivers/gpu/drm/scheduler/sched_entity.c</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
@@ -177,6 +177,7 @@ static void drm_sched_entity_kill_jobs_work(struct work_struct *wrk)</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
struct drm_sched_job *job = container_of(wrk, typeof(*job), work);</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
drm_sched_fence_scheduled(job->s_fence, NULL);</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
+ drm_sched_wakeup(job->sched);</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
drm_sched_fence_finished(job->s_fence, -ESRCH);</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
WARN_ON(job->s_fence->parent);</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
job->sched->ops->free_job(job);</div>
<div class="elementToProof" style="line-height: 19px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
--</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="appendonsend"></div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
Thanks,</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
Lin</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<hr style="display: inline-block; width: 98%;">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<b>From:</b> Koenig, Christian <Christian.Koenig@amd.com><br>
<b>Sent:</b> Thursday, July 10, 2025 15:52<br>
<b>To:</b> cao, lin <lin.cao@amd.com>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
<b>Cc:</b> Yin, ZhenGuo (Chris) <ZhenGuo.Yin@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Matthew Brost <matthew.brost@intel.com>; Danilo Krummrich <dakr@kernel.org>; Philipp Stanner <phasta@kernel.org><br>
<b>Subject:</b> Re: [PATCH] drm/scheduler: Fix sched hang when killing app with dependent jobs
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-size: 11pt;">First of all you need to CC the scheduler maintainers, try to use the get_maintainer.pl script. Adding them on CC.<br>
<br>
On 10.07.25 08:36, Lin.Cao wrote:<br>
> When Application A submits jobs (a1, a2, a3) and application B submits<br>
> job b1 with a dependency on a2's scheduler fence, killing application A<br>
> before run_job(a1) causes drm_sched_entity_kill_jobs_work() to force<br>
> signal all jobs sequentially. However, due to missing work_run_job or<br>
> work_free_job in entity_kill_job_work(), the scheduler enters sleep<br>
> state, causing application B hang.<br>
<br>
Ah! Because of optimizing the dependency when submitting to the same scheduler in drm_sched_entity_add_dependency_cb().<br>
<br>
Yeah that suddenly starts to make sense.<br>
<br>
> Add drm_sched_wakeup() when entity_kill_job_work() to preventing<br>
> scheduler sleep and subsequent application hangs.<br>
><br>
> Signed-off-by: Lin.Cao <lincao12@amd.com><br>
> ---<br>
> drivers/gpu/drm/scheduler/sched_entity.c | 1 +<br>
> 1 file changed, 1 insertion(+)<br>
><br>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c<br>
> index e671aa241720..a22b0f65558a 100644<br>
> --- a/drivers/gpu/drm/scheduler/sched_entity.c<br>
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c<br>
> @@ -180,6 +180,7 @@ static void drm_sched_entity_kill_jobs_work(struct work_struct *wrk)<br>
> drm_sched_fence_finished(job->s_fence, -ESRCH);<br>
> WARN_ON(job->s_fence->parent);<br>
> job->sched->ops->free_job(job);<br>
> + drm_sched_wakeup(job->sched);<br>
<br>
That should probably be after drm_sched_fence_scheduled().<br>
<br>
Alternatively we could also drop the optimization in drm_sched_entity_add_dependency_cb(), scheduling the work item again has only minimal overhead.<br>
<br>
Apart from that looks good to me.<br>
<br>
Regards,<br>
Christian.<br>
<br>
> }<br>
> <br>
> /* Signal the scheduler finished fence when the entity in question is killed. */<br>
<br>
</div>
</div>
</body>
</html>