[PATCH] drm/sched: Remove optimization that causes hang when killing dependent jobs
Christian König
christian.koenig at amd.com
Wed Jul 16 09:53:21 UTC 2025
On 16.07.25 11:43, cao, lin wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>
> Hi Philipp,
>
> Thank you for the review. I found that this optimization was introduced 9 years ago in commit 777dbd458c89d4ca74a659f85ffb5bc817f29a35 ("drm/amdgpu: drop a dummy wakeup scheduler").
>
> Given that the codebase has undergone significant changes over these 9 years. May I ask if I still need to include the Fixes: tag?
Most likely not, a CC: stable tag should be sufficient.
Regards,
Christian.
>
> Thanks,
> Lin
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Philipp Stanner <phasta at mailbox.org>
> *Sent:* Wednesday, July 16, 2025 16:33
> *To:* cao, lin <lin.cao at amd.com>; dri-devel at lists.freedesktop.org <dri-devel at lists.freedesktop.org>
> *Cc:* Yin, ZhenGuo (Chris) <ZhenGuo.Yin at amd.com>; Deng, Emily <Emily.Deng at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; phasta at kernel.org <phasta at kernel.org>; dakr at kernel.org <dakr at kernel.org>; matthew.brost at intel.com <matthew.brost at intel.com>
> *Subject:* Re: [PATCH] drm/sched: Remove optimization that causes hang when killing dependent jobs
>
> On Tue, 2025-07-15 at 21:50 +0800, Lin.Cao wrote:
>> When application A submits jobs and application B submits a job with
>> a
>> dependency on A's fence, the normal flow wakes up the scheduler after
>> processing each job. However, the optimization in
>> drm_sched_entity_add_dependency_cb() uses a callback that only clears
>> dependencies without waking up the scheduler.
>>
>> When application A is killed before its jobs can run, the callback
>> gets
>> triggered but only clears the dependency without waking up the
>> scheduler,
>> causing the scheduler to enter sleep state and application B to hang.
>>
>> Remove the optimization by deleting drm_sched_entity_clear_dep() and
>> its
>> usage, ensuring the scheduler is always woken up when dependencies
>> are
>> cleared.
>>
>> Signed-off-by: Lin.Cao <lincao12 at amd.com>
>
> This is, still, a bug fix, so it needs Fixes: and Cc: stable :)
>
> Could also include a Suggested-by: Christian
>
> P.
>
>> ---
>> drivers/gpu/drm/scheduler/sched_entity.c | 21 ++-------------------
>> 1 file changed, 2 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>> b/drivers/gpu/drm/scheduler/sched_entity.c
>> index e671aa241720..ac678de7fe5e 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct
>> drm_sched_entity *entity)
>> }
>> EXPORT_SYMBOL(drm_sched_entity_destroy);
>>
>> -/* drm_sched_entity_clear_dep - callback to clear the entities
>> dependency */
>> -static void drm_sched_entity_clear_dep(struct dma_fence *f,
>> - struct dma_fence_cb *cb)
>> -{
>> - struct drm_sched_entity *entity =
>> - container_of(cb, struct drm_sched_entity, cb);
>> -
>> - entity->dependency = NULL;
>> - dma_fence_put(f);
>> -}
>> -
>> /*
>> * drm_sched_entity_wakeup - callback to clear the entity's
>> dependency and
>> * wake up the scheduler
>> @@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct
>> dma_fence *f,
>> struct drm_sched_entity *entity =
>> container_of(cb, struct drm_sched_entity, cb);
>>
>> - drm_sched_entity_clear_dep(f, cb);
>> + entity->dependency = NULL;
>> + dma_fence_put(f);
>> drm_sched_wakeup(entity->rq->sched);
>> }
>>
>> @@ -429,13 +419,6 @@ static bool
>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>> fence = dma_fence_get(&s_fence->scheduled);
>> dma_fence_put(entity->dependency);
>> entity->dependency = fence;
>> - if (!dma_fence_add_callback(fence, &entity->cb,
>> -
>> drm_sched_entity_clear_dep))
>> - return true;
>> -
>> - /* Ignore it when it is already scheduled */
>> - dma_fence_put(fence);
>> - return false;
>> }
>>
>> if (!dma_fence_add_callback(entity->dependency, &entity->cb,
>
More information about the dri-devel
mailing list