[PATCH] drm/sched: Avoid double re-lock on the job free path

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Fri Jul 18 09:35:03 UTC 2025


On 18/07/2025 10:31, Philipp Stanner wrote:
> On Fri, 2025-07-18 at 08:13 +0100, Tvrtko Ursulin wrote:
>>
>> On 16/07/2025 21:44, Maíra Canal wrote:
>>> Hi Tvrtko,
>>>
>>> On 16/07/25 11:46, Tvrtko Ursulin wrote:
>>>>
>>>> On 16/07/2025 15:30, Maíra Canal wrote:
>>>>> Hi Tvrtko,
>>>>>
>>>>> On 16/07/25 10:49, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 16/07/2025 14:31, Maíra Canal wrote:
>>>>>>> Hi Tvrtko,
>>>>>>>
>>>>>>> On 16/07/25 05:51, Tvrtko Ursulin wrote:
>>>>>>>> Currently the job free work item will lock sched->job_list_lock
>>>>>>>> first time
>>>>>>>> to see if there are any jobs, free a single job, and then lock
>>>>>>>> again to
>>>>>>>> decide whether to re-queue itself if there are more finished jobs.
>>>>>>>>
>>>>>>>> Since drm_sched_get_finished_job() already looks at the second job
>>>>>>>> in the
>>>>>>>> queue we can simply add the signaled check and have it return the
>>>>>>>> presence
>>>>>>>> of more jobs to be freed to the caller. That way the work item
>>>>>>>> does not
>>>>>>>> have to lock the list again and repeat the signaled check.
>>>>>>>>
>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>>>>>> Cc: Christian König <christian.koenig at amd.com>
>>>>>>>> Cc: Danilo Krummrich <dakr at kernel.org>
>>>>>>>> Cc: Maíra Canal <mcanal at igalia.com>
>>>>>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>>>>>> Cc: Philipp Stanner <phasta at kernel.org>
>>>>>>>> ---
>>>>>>>> v2:
>>>>>>>>    * Improve commit text and kerneldoc. (Philipp)
>>>>>>>>    * Rename run free work helper. (Philipp)
>>>>>>>>
>>>>>>>> v3:
>>>>>>>>    * Rebase on top of Maira's changes.
>>>>>>>> ---
>>>>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 53 +++++++++
>>>>>>>> +----------------
>>>>>>>>    1 file changed, 21 insertions(+), 32 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/
>>>>>>>> drm/ scheduler/sched_main.c
>>>>>>>> index e2cda28a1af4..5a550fd76bf0 100644
>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> @@ -349,34 +349,13 @@ static void drm_sched_run_job_queue(struct
>>>>>>>> drm_gpu_scheduler *sched)
>>>>>>>>    }
>>>>>>>>    /**
>>>>>>>> - * __drm_sched_run_free_queue - enqueue free-job work
>>>>>>>> - * @sched: scheduler instance
>>>>>>>> - */
>>>>>>>> -static void __drm_sched_run_free_queue(struct drm_gpu_scheduler
>>>>>>>> *sched)
>>>>>>>> -{
>>>>>>>> -    if (!READ_ONCE(sched->pause_submit))
>>>>>>>> -        queue_work(sched->submit_wq, &sched->work_free_job);
>>>>>>>> -}
>>>>>>>> -
>>>>>>>> -/**
>>>>>>>> - * drm_sched_run_free_queue - enqueue free-job work if ready
>>>>>>>> + * drm_sched_run_free_queue - enqueue free-job work
>>>>>>>>     * @sched: scheduler instance
>>>>>>>>     */
>>>>>>>>    static void drm_sched_run_free_queue(struct drm_gpu_scheduler
>>>>>>>> *sched)
>>>>>>>>    {
>>>>>>>> -    struct drm_sched_job *job;
>>>>>>>> -
>>>>>>>> -    job = list_first_entry_or_null(&sched->pending_list,
>>>>>>>> -                       struct drm_sched_job, list);
>>>>>>>> -    if (job && dma_fence_is_signaled(&job->s_fence->finished))
>>>>>>>> -        __drm_sched_run_free_queue(sched);
>>>>>>>
>>>>>>> I believe we'd still need this chunk for DRM_GPU_SCHED_STAT_NO_HANG
>>>>>>> (check the comment in drm_sched_job_reinsert_on_false_timeout()). How
>>>>>>
>>>>>> You mean the "is there a signaled job in the list check" is needed
>>>>>> for drm_sched_job_reinsert_on_false_timeout()? Hmm why? Worst case
>>>>>> is a false positive wakeup on the free worker, no?
>>>>>
>>>>> Correct me if I'm mistaken, we would also have a false positive wake-up
>>>>> on the run_job worker, which I believe it could be problematic in the
>>>>> cases that we skipped the reset because the job is still running.
>>>>
>>>> Run job worker exits when it sees no free credits so I don't think
>>>> there is a problem. What am I missing?
>>>>
>>>
>>> I was the one missing the code in `drm_sched_can_queue()`. Sorry for the
>>> misleading comments. This is:
>>>
>>> Reviewed-by: Maíra Canal <mcanal at igalia.com>
>>
>> No worries, and thanks!
>>
>> Philipp - are you okay with this version? V2 was done to address your
>> feedback so that should be good now.
> 
> Was just giving it another spin when you wrote. (a [PATCH v3] would've
> been neat for identification, though – I almost pulled the wrong patch
> from the archive *wink*)

Oops, my bad.

> LGTM, improves things, can be merged.
> 
> However, we had to merge Lin Cao's bug fix [1] recently. That one is
> now in drm-misc-fixes, and your patch should go to drm-misc-next. This
> would cause a conflict once the two branches meet.
> 
> So I suggest that we wait with this non-urgent patch until drm-misc-
> fixes / Linus's -rc gets merged into drm-misc-next, and then we apply
> it. Should be next week or the week after AFAIK.
> 
> Unless somebody has a better idea, of course?

Lin's patch touches sched_entity.c only and mine only sched_main.c - ie. 
no conflict AFAICT?

Regards,

Tvrtko

> 
> Remind me in case I forget.
> 
> 
> P.
> 
> [1] https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/15f77764e90a713ee3916ca424757688e4f565b9
> 
> 
>>
>> Regards,
>>
>> Tvrtko
>>
> 



More information about the dri-devel mailing list