<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - AMD Internal Distribution Only]<br>
</p>
<br>
<div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Hi Philipp, Christian,</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
I modified the commit msg as:</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
drm/sched: Remove optimization that causes hang when killing dependent jobs</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
When application A submits jobs and application B submits a job with a</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
dependency on A's fence, the normal flow wakes up the scheduler after</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
processing each job. However, the optimization in</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
drm_sched_entity_add_dependency_cb() uses a callback that only clears</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
dependencies without waking up the scheduler.</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
When application A is killed before its jobs can run, the callback gets</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
triggered but only clears the dependency without waking up the scheduler,</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
causing the scheduler to enter sleep state and application B to hang.</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Remove the optimization by deleting drm_sched_entity_clear_dep() and its</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
usage, ensuring the scheduler is always woken up when dependencies are</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
cleared.</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Cc: stable@vger.kernel.org # v4.6+</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")</div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Suggested-by: Christian König <Christian.Koenig@amd.com></div>
<div style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Signed-off-by: Lin.Cao <lincao12@amd.com></div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Thanks,</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Lin</div>
<div id="appendonsend"></div>
<div><br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<hr style="display: inline-block; width: 98%;">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<b>From:</b> Philipp Stanner <phasta@mailbox.org><br>
<b>Sent:</b> Wednesday, July 16, 2025 17:57<br>
<b>To:</b> cao, lin <lin.cao@amd.com>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; phasta@kernel.org <phasta@kernel.org><br>
<b>Cc:</b> Yin, ZhenGuo (Chris) <ZhenGuo.Yin@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; dakr@kernel.org <dakr@kernel.org>; matthew.brost@intel.com <matthew.brost@intel.com><br>
<b>Subject:</b> Re: [PATCH] drm/sched: Remove optimization that causes hang when killing dependent jobs
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-size: 11pt;">On Wed, 2025-07-16 at 09:43 +0000, cao, lin wrote:<br>
><br>
> [AMD Official Use Only - AMD Internal Distribution Only]<br>
><br>
><br>
><br>
> Hi Philipp,<br>
><br>
><br>
> Thank you for the review. I found that this optimization was<br>
> introduced 9 years ago in commit<br>
> 777dbd458c89d4ca74a659f85ffb5bc817f29a35 ("drm/amdgpu: drop a dummy<br>
> wakeup scheduler").<br>
><br>
><br>
> Given that the codebase has undergone significant changes over these<br>
> 9 years. May I ask if I still need to include the Fixes: tag?<br>
<br>
Yes. It's a helpful marker to see where the problem comes from, and it<br>
adds redundancy helping the stable-kernel maintainers in figuring out<br>
to which kernels to backport it to.<br>
<br>
If stable can't apply a patch to a very old stable kernel because the<br>
code base changed too much, they'll ping us and we might provide a<br>
dedicated fix.<br>
<br>
So like that:<br>
<br>
Cc: stable@vger.kernel.org # v4.6+<br>
Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")<br>
<br>
<br>
P.<br>
<br>
><br>
><br>
> Thanks,<br>
> Lin<br>
><br>
><br>
> From: Philipp Stanner <phasta@mailbox.org><br>
> Sent: Wednesday, July 16, 2025 16:33<br>
> To: cao, lin <lin.cao@amd.com>; dri-devel@lists.freedesktop.org<br>
> <dri-devel@lists.freedesktop.org><br>
> Cc: Yin, ZhenGuo (Chris) <ZhenGuo.Yin@amd.com>; Deng, Emily<br>
> <Emily.Deng@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>;<br>
> phasta@kernel.org <phasta@kernel.org>; dakr@kernel.org<br>
> <dakr@kernel.org>; matthew.brost@intel.com <matthew.brost@intel.com><br>
> Subject: Re: [PATCH] drm/sched: Remove optimization that causes hang<br>
> when killing dependent jobs<br>
><br>
> <br>
><br>
><br>
> On Tue, 2025-07-15 at 21:50 +0800, Lin.Cao wrote:<br>
> > When application A submits jobs and application B submits a job<br>
> > with<br>
> > a<br>
> > dependency on A's fence, the normal flow wakes up the scheduler<br>
> > after<br>
> > processing each job. However, the optimization in<br>
> > drm_sched_entity_add_dependency_cb() uses a callback that only<br>
> > clears<br>
> > dependencies without waking up the scheduler.<br>
> ><br>
> > When application A is killed before its jobs can run, the callback<br>
> > gets<br>
> > triggered but only clears the dependency without waking up the<br>
> > scheduler,<br>
> > causing the scheduler to enter sleep state and application B to<br>
> > hang.<br>
> ><br>
> > Remove the optimization by deleting drm_sched_entity_clear_dep()<br>
> > and<br>
> > its<br>
> > usage, ensuring the scheduler is always woken up when dependencies<br>
> > are<br>
> > cleared.<br>
> ><br>
> > Signed-off-by: Lin.Cao <lincao12@amd.com><br>
><br>
> This is, still, a bug fix, so it needs Fixes: and Cc: stable :)<br>
><br>
> Could also include a Suggested-by: Christian<br>
><br>
> P.<br>
><br>
> > ---<br>
> > drivers/gpu/drm/scheduler/sched_entity.c | 21 ++------------------<br>
> > -<br>
> > 1 file changed, 2 insertions(+), 19 deletions(-)<br>
> ><br>
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c<br>
> > b/drivers/gpu/drm/scheduler/sched_entity.c<br>
> > index e671aa241720..ac678de7fe5e 100644<br>
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c<br>
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c<br>
> > @@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct<br>
> > drm_sched_entity *entity)<br>
> > }<br>
> > EXPORT_SYMBOL(drm_sched_entity_destroy);<br>
> > <br>
> > -/* drm_sched_entity_clear_dep - callback to clear the entities<br>
> > dependency */<br>
> > -static void drm_sched_entity_clear_dep(struct dma_fence *f,<br>
> > - struct dma_fence_cb *cb)<br>
> > -{<br>
> > - struct drm_sched_entity *entity =<br>
> > - container_of(cb, struct drm_sched_entity, cb);<br>
> > -<br>
> > - entity->dependency = NULL;<br>
> > - dma_fence_put(f);<br>
> > -}<br>
> > -<br>
> > /*<br>
> > * drm_sched_entity_wakeup - callback to clear the entity's<br>
> > dependency and<br>
> > * wake up the scheduler<br>
> > @@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct<br>
> > dma_fence *f,<br>
> > struct drm_sched_entity *entity =<br>
> > container_of(cb, struct drm_sched_entity, cb);<br>
> > <br>
> > - drm_sched_entity_clear_dep(f, cb);<br>
> > + entity->dependency = NULL;<br>
> > + dma_fence_put(f);<br>
> > drm_sched_wakeup(entity->rq->sched);<br>
> > }<br>
> > <br>
> > @@ -429,13 +419,6 @@ static bool<br>
> > drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)<br>
> > fence = dma_fence_get(&s_fence->scheduled);<br>
> > dma_fence_put(entity->dependency);<br>
> > entity->dependency = fence;<br>
> > - if (!dma_fence_add_callback(fence, &entity->cb,<br>
> > - <br>
> > drm_sched_entity_clear_dep))<br>
> > - return true;<br>
> > -<br>
> > - /* Ignore it when it is already scheduled */<br>
> > - dma_fence_put(fence);<br>
> > - return false;<br>
> > }<br>
> > <br>
> > if (!dma_fence_add_callback(entity->dependency, &entity->cb,<br>
><br>
<br>
</div>
</div>
</body>
</html>