[PATCH] drm/sched: Remove optimization that causes hang when killing dependent jobs
Philipp Stanner
phasta at mailbox.org
Wed Jul 16 09:57:53 UTC 2025
On Wed, 2025-07-16 at 09:43 +0000, cao, lin wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>
>
> Hi Philipp,
>
>
> Thank you for the review. I found that this optimization was
> introduced 9 years ago in commit
> 777dbd458c89d4ca74a659f85ffb5bc817f29a35 ("drm/amdgpu: drop a dummy
> wakeup scheduler").
>
>
> Given that the codebase has undergone significant changes over these
> 9 years. May I ask if I still need to include the Fixes: tag?
Yes. It's a helpful marker to see where the problem comes from, and it
adds redundancy helping the stable-kernel maintainers in figuring out
to which kernels to backport it to.
If stable can't apply a patch to a very old stable kernel because the
code base changed too much, they'll ping us and we might provide a
dedicated fix.
So like that:
Cc: stable at vger.kernel.org # v4.6+
Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")
P.
>
>
> Thanks,
> Lin
>
>
> From: Philipp Stanner <phasta at mailbox.org>
> Sent: Wednesday, July 16, 2025 16:33
> To: cao, lin <lin.cao at amd.com>; dri-devel at lists.freedesktop.org
> <dri-devel at lists.freedesktop.org>
> Cc: Yin, ZhenGuo (Chris) <ZhenGuo.Yin at amd.com>; Deng, Emily
> <Emily.Deng at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>;
> phasta at kernel.org <phasta at kernel.org>; dakr at kernel.org
> <dakr at kernel.org>; matthew.brost at intel.com <matthew.brost at intel.com>
> Subject: Re: [PATCH] drm/sched: Remove optimization that causes hang
> when killing dependent jobs
>
>
>
>
> On Tue, 2025-07-15 at 21:50 +0800, Lin.Cao wrote:
> > When application A submits jobs and application B submits a job
> > with
> > a
> > dependency on A's fence, the normal flow wakes up the scheduler
> > after
> > processing each job. However, the optimization in
> > drm_sched_entity_add_dependency_cb() uses a callback that only
> > clears
> > dependencies without waking up the scheduler.
> >
> > When application A is killed before its jobs can run, the callback
> > gets
> > triggered but only clears the dependency without waking up the
> > scheduler,
> > causing the scheduler to enter sleep state and application B to
> > hang.
> >
> > Remove the optimization by deleting drm_sched_entity_clear_dep()
> > and
> > its
> > usage, ensuring the scheduler is always woken up when dependencies
> > are
> > cleared.
> >
> > Signed-off-by: Lin.Cao <lincao12 at amd.com>
>
> This is, still, a bug fix, so it needs Fixes: and Cc: stable :)
>
> Could also include a Suggested-by: Christian
>
> P.
>
> > ---
> > drivers/gpu/drm/scheduler/sched_entity.c | 21 ++------------------
> > -
> > 1 file changed, 2 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > b/drivers/gpu/drm/scheduler/sched_entity.c
> > index e671aa241720..ac678de7fe5e 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct
> > drm_sched_entity *entity)
> > }
> > EXPORT_SYMBOL(drm_sched_entity_destroy);
> >
> > -/* drm_sched_entity_clear_dep - callback to clear the entities
> > dependency */
> > -static void drm_sched_entity_clear_dep(struct dma_fence *f,
> > - struct dma_fence_cb *cb)
> > -{
> > - struct drm_sched_entity *entity =
> > - container_of(cb, struct drm_sched_entity, cb);
> > -
> > - entity->dependency = NULL;
> > - dma_fence_put(f);
> > -}
> > -
> > /*
> > * drm_sched_entity_wakeup - callback to clear the entity's
> > dependency and
> > * wake up the scheduler
> > @@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct
> > dma_fence *f,
> > struct drm_sched_entity *entity =
> > container_of(cb, struct drm_sched_entity, cb);
> >
> > - drm_sched_entity_clear_dep(f, cb);
> > + entity->dependency = NULL;
> > + dma_fence_put(f);
> > drm_sched_wakeup(entity->rq->sched);
> > }
> >
> > @@ -429,13 +419,6 @@ static bool
> > drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> > fence = dma_fence_get(&s_fence->scheduled);
> > dma_fence_put(entity->dependency);
> > entity->dependency = fence;
> > - if (!dma_fence_add_callback(fence, &entity->cb,
> > -
> > drm_sched_entity_clear_dep))
> > - return true;
> > -
> > - /* Ignore it when it is already scheduled */
> > - dma_fence_put(fence);
> > - return false;
> > }
> >
> > if (!dma_fence_add_callback(entity->dependency, &entity->cb,
>
More information about the dri-devel
mailing list