[PATCH] drm/scheduler: signal scheduled fence when kill job

Tvrtko Ursulin tursulin at ursulin.net
Thu May 15 10:22:09 UTC 2025


On 15/05/2025 10:05, Philipp Stanner wrote:
> On Thu, 2025-05-15 at 10:48 +0200, Christian König wrote:
>> Explicitly adding the scheduler maintainers.
>>
>> On 5/15/25 04:07, Lin.Cao wrote:
>>> Previously we only signaled finished fence which may cause some
>>> submission's dependency cannot be cleared the cause benchmark hang.
>>> Signal both scheduled fence and finished fence could fix this
>>> issue.
> 
> Code seems legit to me; but be so kind and also pimp up the commit
> message a bit, Christian. It's not very clear what the bug is and why
> setting the parent to NULL solves it. Or is the issue simply that the
> fence might be dropped unsignaled, being a bug by definition? Needs to
> be written down.
> 
> Grammar is also a bit too broken.
> 
> And running the unit tests before pushing is probably also a good idea.

I believe we even have DRM rules that state unit tests coverage should 
even be added when fixing issues in the component which has unit tests. ;)

"""
KUnit Coverage Rules
~~~~~~~~~~~~~~~~~~~~

KUnit support is gradually added to the DRM framework and helpers. 
There's no
general requirement for the framework and helpers to have KUnit tests at the
moment. However, patches that are affecting a function or helper already
covered by KUnit tests must provide tests if the change calls for one.
"""

So a new variant similar to drm_sched_basic_entity_cleanup() would be 
very welcomed.

Regards,

Tvrtko

> 
>>>
>>> Signed-off-by: Lin.Cao <lincao12 at amd.com>
> 
> Acked-by: Philipp Stanner <phasta at kernel.org>
> 
>>
>> Reviewed-by: Christian König <christian.koenig at amd.com>
>>
>> Danilo & Philipp can we quickly get an rb for that? I'm volunteering
>> to push it to drm-misc-fixes and add the necessary stable tags since
>> this is a fix for a rather ugly bug.
>>
>> Regards,
>> Christian.
>>
>>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_entity.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index bd39db7bb240..e671aa241720 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -176,6 +176,7 @@ static void
>>> drm_sched_entity_kill_jobs_work(struct work_struct *wrk)
>>>   {
>>>   	struct drm_sched_job *job = container_of(wrk,
>>> typeof(*job), work);
>>>   
>>> +	drm_sched_fence_scheduled(job->s_fence, NULL);
>>>   	drm_sched_fence_finished(job->s_fence, -ESRCH);
>>>   	WARN_ON(job->s_fence->parent);
>>>   	job->sched->ops->free_job(job);
>>
> 



More information about the dri-devel mailing list