[PATCH] drm/sched: Avoid double re-lock on the job free path

Wed Jul 16 20:44:12 UTC 2025

Hi Tvrtko,

On 16/07/25 11:46, Tvrtko Ursulin wrote:
> 
> On 16/07/2025 15:30, Maíra Canal wrote:
>> Hi Tvrtko,
>>
>> On 16/07/25 10:49, Tvrtko Ursulin wrote:
>>>
>>> On 16/07/2025 14:31, Maíra Canal wrote:
>>>> Hi Tvrtko,
>>>>
>>>> On 16/07/25 05:51, Tvrtko Ursulin wrote:
>>>>> Currently the job free work item will lock sched->job_list_lock 
>>>>> first time
>>>>> to see if there are any jobs, free a single job, and then lock 
>>>>> again to
>>>>> decide whether to re-queue itself if there are more finished jobs.
>>>>>
>>>>> Since drm_sched_get_finished_job() already looks at the second job 
>>>>> in the
>>>>> queue we can simply add the signaled check and have it return the 
>>>>> presence
>>>>> of more jobs to be freed to the caller. That way the work item does 
>>>>> not
>>>>> have to lock the list again and repeat the signaled check.
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>>> Cc: Christian König <christian.koenig at amd.com>
>>>>> Cc: Danilo Krummrich <dakr at kernel.org>
>>>>> Cc: Maíra Canal <mcanal at igalia.com>
>>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>>> Cc: Philipp Stanner <phasta at kernel.org>
>>>>> ---
>>>>> v2:
>>>>>   * Improve commit text and kerneldoc. (Philipp)
>>>>>   * Rename run free work helper. (Philipp)
>>>>>
>>>>> v3:
>>>>>   * Rebase on top of Maira's changes.
>>>>> ---
>>>>>   drivers/gpu/drm/scheduler/sched_main.c | 53 +++++++++ 
>>>>> +----------------
>>>>>   1 file changed, 21 insertions(+), 32 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/ 
>>>>> drm/ scheduler/sched_main.c
>>>>> index e2cda28a1af4..5a550fd76bf0 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -349,34 +349,13 @@ static void drm_sched_run_job_queue(struct 
>>>>> drm_gpu_scheduler *sched)
>>>>>   }
>>>>>   /**
>>>>> - * __drm_sched_run_free_queue - enqueue free-job work
>>>>> - * @sched: scheduler instance
>>>>> - */
>>>>> -static void __drm_sched_run_free_queue(struct drm_gpu_scheduler 
>>>>> *sched)
>>>>> -{
>>>>> -    if (!READ_ONCE(sched->pause_submit))
>>>>> -        queue_work(sched->submit_wq, &sched->work_free_job);
>>>>> -}
>>>>> -
>>>>> -/**
>>>>> - * drm_sched_run_free_queue - enqueue free-job work if ready
>>>>> + * drm_sched_run_free_queue - enqueue free-job work
>>>>>    * @sched: scheduler instance
>>>>>    */
>>>>>   static void drm_sched_run_free_queue(struct drm_gpu_scheduler 
>>>>> *sched)
>>>>>   {
>>>>> -    struct drm_sched_job *job;
>>>>> -
>>>>> -    job = list_first_entry_or_null(&sched->pending_list,
>>>>> -                       struct drm_sched_job, list);
>>>>> -    if (job && dma_fence_is_signaled(&job->s_fence->finished))
>>>>> -        __drm_sched_run_free_queue(sched);
>>>>
>>>> I believe we'd still need this chunk for DRM_GPU_SCHED_STAT_NO_HANG
>>>> (check the comment in drm_sched_job_reinsert_on_false_timeout()). How
>>>
>>> You mean the "is there a signaled job in the list check" is needed 
>>> for drm_sched_job_reinsert_on_false_timeout()? Hmm why? Worst case is 
>>> a false positive wakeup on the free worker, no?
>>
>> Correct me if I'm mistaken, we would also have a false positive wake-up
>> on the run_job worker, which I believe it could be problematic in the
>> cases that we skipped the reset because the job is still running.
> 
> Run job worker exits when it sees no free credits so I don't think there 
> is a problem. What am I missing?
> 

I was the one missing the code in `drm_sched_can_queue()`. Sorry for the
misleading comments. This is:

Reviewed-by: Maíra Canal <mcanal at igalia.com>

Best Regards,
- Maíra