[PATCH 1/8] drm/sched: Allow drivers to skip the reset and keep on running

Sat May 24 13:33:48 UTC 2025

Hi Philipp,

Sorry, I was OoO for a couple of weeks.

On 13/05/25 04:26, Philipp Stanner wrote:
> On Sat, 2025-05-03 at 17:59 -0300, Maíra Canal wrote:
>> When the DRM scheduler times out, it's possible that the GPU isn't
>> hung;
>> instead, a job may still be running, and there may be no valid reason
>> to
>> reset the hardware. This can occur in two situations:
>>
>>    1. The GPU exposes some mechanism that ensures the GPU is still
>> making
>>       progress. By checking this mechanism, we can safely skip the
>> reset,
>>       rearm the timeout, and allow the job to continue running until
>>       completion. This is the case for v3d and Etnaviv.
>>    2. TDR has fired before the IRQ that signals the fence.
>> Consequently,
>>       the job actually finishes, but it triggers a timeout before
>> signaling
>>       the completion fence.
>>
>> These two scenarios are problematic because we remove the job from
>> the
>> `sched->pending_list` before calling `sched->ops->timedout_job()`.
>> This
>> means that when the job finally signals completion (e.g. in the IRQ
>> handler), the scheduler won't call `sched->ops->free_job()`. As a
>> result,
>> the job and its resources won't be freed, leading to a memory leak.
> 
> We have discussed this and discovered another, related issue. See
> below.
> 
>>
>> To resolve this issue, we create a new `drm_gpu_sched_stat` that
>> allows a
>> driver to skip the reset. This new status will indicate that the job
>> should be reinserted into the pending list, and the driver will still
>> signal its completion.
>>
>> Signed-off-by: Maíra Canal <mcanal at igalia.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++++++++++++++
>>   include/drm/gpu_scheduler.h            |  2 ++
>>   2 files changed, 16 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index
>> 829579c41c6b5d8b2abce5ad373c7017469b7680..68ca827d77e32187a034309f881
>> 135dbc639a9b4 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -568,6 +568,17 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)
> 
> So, the fundamental design problem we have is that the scheduler
> assumes that when a timeout occurs, the GPU is completely hung. Your
> patch addresses another aspect of that very problem.
> 
> But if the GPU is not hung, it can signal the hardware fence at any
> moment. So that's racy.

Unfortunately, this already happens, which would be the second point of
the list in the commit message.

> 
> It could, theoretically, lead to backend_ops.timedout_job() being
> called with a signaled job, i.e., a job that is not really timed out.
> 
> Would you say this is *the same* issue you're describing, or a separate
> one? It seems to me that it's a separate one.

It isn't the issue that I'm describing in the sense that the scheduler
itself won't do anything to address this issue. However, several drivers
already handle with this situation internally by checking the result of
`dma_fence_is_signaled()` and bailing out of the timeout if the job is
signaled. We would provide a proper status code for those drivers and
avoid memory leaks.

> 
> Anyways. What I propose is that we wait until your series here has been
> merged. Once that's done, we should document that drivers should expect
> that backend_ops.timedout_job() can get called with a job that has not
> actually timed out, and tell the scheduler about it through
> DRM_GPU_SCHED_STAT_NOT_HANGING. Then the scheduler reverts the
> timeout's actions, as you propose here.
> 
> 
>>   			job->sched->ops->free_job(job);
>>   			sched->free_guilty = false;
>>   		}
>> +
>> +		/*
>> +		 * If the driver indicated that the GPU is still
>> running and wants to skip
>> +		 * the reset, reinsert the job back into the pending
>> list and realarm the
>> +		 * timeout.
>> +		 */
>> +		if (status == DRM_GPU_SCHED_STAT_RUNNING) {
>> +			spin_lock(&sched->job_list_lock);
>> +			list_add(&job->list, &sched->pending_list);
>> +			spin_unlock(&sched->job_list_lock);
>> +		}
> 
> btw, if you go for Matt's requeue work item approach, it'll be better
> to write a helper function with a clear name for all that.
> 
> drm_sched_job_reinsert_on_false_timout() maybe.

Thanks for the review, I'll send v2 soon.

Best Regards,
- Maíra

> 
> 
> P.
> 
> 
>>   	} else {
>>   		spin_unlock(&sched->job_list_lock);
>>   	}