[RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Wed May 14 08:57:04 UTC 2025
On 12/05/2025 13:53, Philipp Stanner wrote:
> On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
>> Reduce to one spin_unlock for hopefully a little bit clearer flow in
>> the
>> function. It may appear that there is a behavioural change with the
>> drm_sched_start_timeout_unlocked() now not being called if there were
>> initially no jobs on the pending list, and then some appeared after
>> unlock, however if the code would rely on the TDR handler restarting
>> itself then it would fail to do that if the job arrived on the
>> pending
>> list after the check.
>>
>> Also fix one stale comment while touching the function.
>
> Same here, that's a good candidate for a separate patch / series.
It conflicts with the in progress work from Maíra (fixing memory leaks
on false timeouts) so I will keep this one on the back-burner until her
work lands.
Regards,
Tvrtko
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Danilo Krummrich <dakr at kernel.org>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Philipp Stanner <phasta at kernel.org>
>> ---
>> drivers/gpu/drm/scheduler/sched_main.c | 37 +++++++++++++-----------
>> --
>> 1 file changed, 18 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index a45b02fd2af3..a26cc11c8ade 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -516,38 +516,37 @@ static void drm_sched_job_begin(struct
>> drm_sched_job *s_job)
>>
>> static void drm_sched_job_timedout(struct work_struct *work)
>> {
>> - struct drm_gpu_scheduler *sched;
>> + struct drm_gpu_scheduler *sched =
>> + container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>> + enum drm_gpu_sched_stat status;
>> struct drm_sched_job *job;
>> - enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
>> -
>> - sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>>
>> /* Protects against concurrent deletion in
>> drm_sched_get_finished_job */
>> spin_lock(&sched->job_list_lock);
>> job = list_first_entry_or_null(&sched->pending_list,
>> struct drm_sched_job, list);
>> -
>> if (job) {
>> /*
>> * Remove the bad job so it cannot be freed by
>> concurrent
>> - * drm_sched_cleanup_jobs. It will be reinserted
>> back after sched->thread
>> - * is parked at which point it's safe.
>> + * drm_sched_get_finished_job. It will be reinserted
>> back after
>> + * scheduler worker is stopped at which point it's
>> safe.
>> */
>> list_del_init(&job->list);
>> - spin_unlock(&sched->job_list_lock);
>> + }
>> + spin_unlock(&sched->job_list_lock);
>>
>> - status = job->sched->ops->timedout_job(job);
>> + if (!job)
>> + return;
>>
>> - /*
>> - * Guilty job did complete and hence needs to be
>> manually removed
>> - * See drm_sched_stop doc.
>> - */
>> - if (sched->free_guilty) {
>> - job->sched->ops->free_job(job);
>> - sched->free_guilty = false;
>> - }
>> - } else {
>> - spin_unlock(&sched->job_list_lock);
>> + status = job->sched->ops->timedout_job(job);
>> +
>> + /*
>> + * Guilty job did complete and hence needs to be manually
>> removed. See
>> + * documentation for drm_sched_stop.
>> + */
>> + if (sched->free_guilty) {
>> + job->sched->ops->free_job(job);
>> + sched->free_guilty = false;
>> }
>>
>> if (status != DRM_GPU_SCHED_STAT_ENODEV)
>
More information about the dri-devel
mailing list