<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - AMD Internal Distribution Only]<br>
</p>
<br>
<div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Hi Philipp,</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Thank you for the review. I found that this optimization was introduced 9 years ago in commit 777dbd458c89d4ca74a659f85ffb5bc817f29a35 ("drm/amdgpu: drop a dummy wakeup scheduler").</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Given that the codebase has undergone significant changes over these 9 years. May I ask if I still need to include the Fixes: tag?</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Thanks,</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Lin</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Philipp Stanner <phasta@mailbox.org><br>
<b>Sent:</b> Wednesday, July 16, 2025 16:33<br>
<b>To:</b> cao, lin <lin.cao@amd.com>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org><br>
<b>Cc:</b> Yin, ZhenGuo (Chris) <ZhenGuo.Yin@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; phasta@kernel.org <phasta@kernel.org>; dakr@kernel.org <dakr@kernel.org>; matthew.brost@intel.com <matthew.brost@intel.com><br>
<b>Subject:</b> Re: [PATCH] drm/sched: Remove optimization that causes hang when killing dependent jobs</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">On Tue, 2025-07-15 at 21:50 +0800, Lin.Cao wrote:<br>
> When application A submits jobs and application B submits a job with<br>
> a<br>
> dependency on A's fence, the normal flow wakes up the scheduler after<br>
> processing each job. However, the optimization in<br>
> drm_sched_entity_add_dependency_cb() uses a callback that only clears<br>
> dependencies without waking up the scheduler.<br>
> <br>
> When application A is killed before its jobs can run, the callback<br>
> gets<br>
> triggered but only clears the dependency without waking up the<br>
> scheduler,<br>
> causing the scheduler to enter sleep state and application B to hang.<br>
> <br>
> Remove the optimization by deleting drm_sched_entity_clear_dep() and<br>
> its<br>
> usage, ensuring the scheduler is always woken up when dependencies<br>
> are<br>
> cleared.<br>
> <br>
> Signed-off-by: Lin.Cao <lincao12@amd.com><br>
<br>
This is, still, a bug fix, so it needs Fixes: and Cc: stable :)<br>
<br>
Could also include a Suggested-by: Christian<br>
<br>
P.<br>
<br>
> ---<br>
> drivers/gpu/drm/scheduler/sched_entity.c | 21 ++-------------------<br>
> 1 file changed, 2 insertions(+), 19 deletions(-)<br>
> <br>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c<br>
> b/drivers/gpu/drm/scheduler/sched_entity.c<br>
> index e671aa241720..ac678de7fe5e 100644<br>
> --- a/drivers/gpu/drm/scheduler/sched_entity.c<br>
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c<br>
> @@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct<br>
> drm_sched_entity *entity)<br>
> }<br>
> EXPORT_SYMBOL(drm_sched_entity_destroy);<br>
> <br>
> -/* drm_sched_entity_clear_dep - callback to clear the entities<br>
> dependency */<br>
> -static void drm_sched_entity_clear_dep(struct dma_fence *f,<br>
> - struct dma_fence_cb *cb)<br>
> -{<br>
> - struct drm_sched_entity *entity =<br>
> - container_of(cb, struct drm_sched_entity, cb);<br>
> -<br>
> - entity->dependency = NULL;<br>
> - dma_fence_put(f);<br>
> -}<br>
> -<br>
> /*<br>
> * drm_sched_entity_wakeup - callback to clear the entity's<br>
> dependency and<br>
> * wake up the scheduler<br>
> @@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct<br>
> dma_fence *f,<br>
> struct drm_sched_entity *entity =<br>
> container_of(cb, struct drm_sched_entity, cb);<br>
> <br>
> - drm_sched_entity_clear_dep(f, cb);<br>
> + entity->dependency = NULL;<br>
> + dma_fence_put(f);<br>
> drm_sched_wakeup(entity->rq->sched);<br>
> }<br>
> <br>
> @@ -429,13 +419,6 @@ static bool<br>
> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)<br>
> fence = dma_fence_get(&s_fence->scheduled);<br>
> dma_fence_put(entity->dependency);<br>
> entity->dependency = fence;<br>
> - if (!dma_fence_add_callback(fence, &entity->cb,<br>
> - <br>
> drm_sched_entity_clear_dep))<br>
> - return true;<br>
> -<br>
> - /* Ignore it when it is already scheduled */<br>
> - dma_fence_put(fence);<br>
> - return false;<br>
> }<br>
> <br>
> if (!dma_fence_add_callback(entity->dependency, &entity->cb,<br>
<br>
</div>
</span></font></div>
</div>
</body>
</html>