[PATCH] drm/sched: Remove optimization that causes hang when killing dependent jobs

cao, lin lin.cao at amd.com
Wed Jul 16 09:43:14 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]

Hi Philipp,

Thank you for the review. I found that this optimization was introduced 9 years ago in commit 777dbd458c89d4ca74a659f85ffb5bc817f29a35 ("drm/amdgpu: drop a dummy wakeup scheduler").

Given that the codebase has undergone significant changes over these 9 years. May I ask if I still need to include the Fixes: tag?

Thanks,
Lin
________________________________
From: Philipp Stanner <phasta at mailbox.org>
Sent: Wednesday, July 16, 2025 16:33
To: cao, lin <lin.cao at amd.com>; dri-devel at lists.freedesktop.org <dri-devel at lists.freedesktop.org>
Cc: Yin, ZhenGuo (Chris) <ZhenGuo.Yin at amd.com>; Deng, Emily <Emily.Deng at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; phasta at kernel.org <phasta at kernel.org>; dakr at kernel.org <dakr at kernel.org>; matthew.brost at intel.com <matthew.brost at intel.com>
Subject: Re: [PATCH] drm/sched: Remove optimization that causes hang when killing dependent jobs

On Tue, 2025-07-15 at 21:50 +0800, Lin.Cao wrote:
> When application A submits jobs and application B submits a job with
> a
> dependency on A's fence, the normal flow wakes up the scheduler after
> processing each job. However, the optimization in
> drm_sched_entity_add_dependency_cb() uses a callback that only clears
> dependencies without waking up the scheduler.
>
> When application A is killed before its jobs can run, the callback
> gets
> triggered but only clears the dependency without waking up the
> scheduler,
> causing the scheduler to enter sleep state and application B to hang.
>
> Remove the optimization by deleting drm_sched_entity_clear_dep() and
> its
> usage, ensuring the scheduler is always woken up when dependencies
> are
> cleared.
>
> Signed-off-by: Lin.Cao <lincao12 at amd.com>

This is, still, a bug fix, so it needs Fixes: and Cc: stable :)

Could also include a Suggested-by: Christian

P.

> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 21 ++-------------------
>  1 file changed, 2 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index e671aa241720..ac678de7fe5e 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct
> drm_sched_entity *entity)
>  }
>  EXPORT_SYMBOL(drm_sched_entity_destroy);
>
> -/* drm_sched_entity_clear_dep - callback to clear the entities
> dependency */
> -static void drm_sched_entity_clear_dep(struct dma_fence *f,
> -                                    struct dma_fence_cb *cb)
> -{
> -     struct drm_sched_entity *entity =
> -             container_of(cb, struct drm_sched_entity, cb);
> -
> -     entity->dependency = NULL;
> -     dma_fence_put(f);
> -}
> -
>  /*
>   * drm_sched_entity_wakeup - callback to clear the entity's
> dependency and
>   * wake up the scheduler
> @@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct
> dma_fence *f,
>       struct drm_sched_entity *entity =
>               container_of(cb, struct drm_sched_entity, cb);
>
> -     drm_sched_entity_clear_dep(f, cb);
> +     entity->dependency = NULL;
> +     dma_fence_put(f);
>       drm_sched_wakeup(entity->rq->sched);
>  }
>
> @@ -429,13 +419,6 @@ static bool
> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>               fence = dma_fence_get(&s_fence->scheduled);
>               dma_fence_put(entity->dependency);
>               entity->dependency = fence;
> -             if (!dma_fence_add_callback(fence, &entity->cb,
> -
> drm_sched_entity_clear_dep))
> -                     return true;
> -
> -             /* Ignore it when it is already scheduled */
> -             dma_fence_put(fence);
> -             return false;
>       }
>
>       if (!dma_fence_add_callback(entity->dependency, &entity->cb,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20250716/130385aa/attachment-0001.htm>


More information about the dri-devel mailing list