[PATCH] drm/scheduler: signal scheduled fence when kill job

Tvrtko Ursulin tursulin at ursulin.net
Mon Jun 2 13:01:20 UTC 2025


On 15/05/2025 13:39, Christian König wrote:
> On 5/15/25 11:05, Philipp Stanner wrote:
>> On Thu, 2025-05-15 at 10:48 +0200, Christian König wrote:
>>> Explicitly adding the scheduler maintainers.
>>>
>>> On 5/15/25 04:07, Lin.Cao wrote:
>>>> Previously we only signaled finished fence which may cause some
>>>> submission's dependency cannot be cleared the cause benchmark hang.
>>>> Signal both scheduled fence and finished fence could fix this
>>>> issue.
>>
>> Code seems legit to me; but be so kind and also pimp up the commit
>> message a bit, Christian. It's not very clear what the bug is and why
>> setting the parent to NULL solves it. Or is the issue simply that the
>> fence might be dropped unsignaled, being a bug by definition? Needs to
>> be written down.
> 
> The later, we simply forgot to signal the scheduled fence when an application was killed.
> 
>> Grammar is also a bit too broken.
>>
>> And running the unit tests before pushing is probably also a good idea.
> 
> And maybe even writing a new unit test for that.

Gentle reminder that test would be needed as per 099b79f94366 ("drm/doc: 
Document KUnit expectations").

Regards,

Tvrtko


>>>>
>>>> Signed-off-by: Lin.Cao <lincao12 at amd.com>
>>
>> Acked-by: Philipp Stanner <phasta at kernel.org>
>>
>>>
>>> Reviewed-by: Christian König <christian.koenig at amd.com>
>>>
>>> Danilo & Philipp can we quickly get an rb for that? I'm volunteering
>>> to push it to drm-misc-fixes and add the necessary stable tags since
>>> this is a fix for a rather ugly bug.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>> ---
>>>>   drivers/gpu/drm/scheduler/sched_entity.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index bd39db7bb240..e671aa241720 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -176,6 +176,7 @@ static void
>>>> drm_sched_entity_kill_jobs_work(struct work_struct *wrk)
>>>>   {
>>>>   	struct drm_sched_job *job = container_of(wrk,
>>>> typeof(*job), work);
>>>>   
>>>> +	drm_sched_fence_scheduled(job->s_fence, NULL);
>>>>   	drm_sched_fence_finished(job->s_fence, -ESRCH);
>>>>   	WARN_ON(job->s_fence->parent);
>>>>   	job->sched->ops->free_job(job);
>>>
>>
> 



More information about the dri-devel mailing list