[PATCH 1/8] drm/sched: Allow drivers to skip the reset and keep on running

Wed May 7 12:33:31 UTC 2025

Hi Tvrtko,

Thanks for the review!

On 06/05/25 08:32, Tvrtko Ursulin wrote:
> 
> On 03/05/2025 21:59, Maíra Canal wrote:
>> When the DRM scheduler times out, it's possible that the GPU isn't hung;
>> instead, a job may still be running, and there may be no valid reason to
>> reset the hardware. This can occur in two situations:
>>
>>    1. The GPU exposes some mechanism that ensures the GPU is still making
>>       progress. By checking this mechanism, we can safely skip the reset,
>>       rearm the timeout, and allow the job to continue running until
>>       completion. This is the case for v3d and Etnaviv.
>>    2. TDR has fired before the IRQ that signals the fence. Consequently,
>>       the job actually finishes, but it triggers a timeout before 
>> signaling
>>       the completion fence.
>>
>> These two scenarios are problematic because we remove the job from the
>> `sched->pending_list` before calling `sched->ops->timedout_job()`. This
>> means that when the job finally signals completion (e.g. in the IRQ
>> handler), the scheduler won't call `sched->ops->free_job()`. As a result,
>> the job and its resources won't be freed, leading to a memory leak.
>>
>> To resolve this issue, we create a new `drm_gpu_sched_stat` that allows a
>> driver to skip the reset. This new status will indicate that the job
>> should be reinserted into the pending list, and the driver will still
>> signal its completion.
> 

[...]

>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 
>> 1a7e377d4cbb4fc12ed93c548b236970217945e8..fe9043b6d43141bee831b5fc16b927202a507d51 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -389,11 +389,13 @@ struct drm_sched_job {
>>    * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
>>    * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded.
>>    * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available anymore.
>> + * @DRM_GPU_SCHED_STAT_RUNNING: GPU is still running, so skip the reset.
> 
> s/GPU/job/ ?
> 
>>    */
>>   enum drm_gpu_sched_stat {
>>       DRM_GPU_SCHED_STAT_NONE,
>>       DRM_GPU_SCHED_STAT_NOMINAL,
>>       DRM_GPU_SCHED_STAT_ENODEV,
>> +    DRM_GPU_SCHED_STAT_RUNNING,
> 
> I am wondering if we could make it more obvious what is the difference 
> between "nominal" and "running" and from whose point of view should 
> those statuses be considered.
 > > So far we have "nominal" which means scheduler/hardware is working 
fine
> but the job may or may have not been cancelled. With "running" we kind 
> of split it into two sub-statuses and it would be nice for that to be 
> intuitively visible from the naming. But I struggle to suggest an 
> elegant name while preserving nominal as is.

I was thinking: how about changing DRM_GPU_SCHED_STAT_NOMINAL to
DRM_GPU_SCHED_STAT_RESET (the hardware is fine, but we reset it)?

Then, when we skip the reset, we would have DRM_GPU_SCHED_STAT_NOMINAL
(which means the hardware is fine and we didn't reset it).

I'm open to other suggestions.

> 
> Thinking out loud here - perhaps that is pointing towards an alternative 
> that instead of a new status, a new helper to re-insert the single job 
> (like drm_sched_resubmit_job(sched, job)) would fit better? Although it 
> would be more churn.
> 

Although your solution might be more elegant, I'm worried that such a
function could be used improperly by new users (e.g. being called in
contexts other than `timedout_job()`).

I'd prefer to have a new status as it'll be use solely for
`timedout_job()` (making it harder for users to use it inappropriately).
With the addition of Matthew's feedback (calling
`drm_sched_run_free_queue()` after adding the job to the pending list),
I think it makes even more sense to keep it inside the timeout function.

I hope others can chime in and give their opinions about your idea.

Best Regards,
- Maíra

> Regards,
> 
> Tvrtko
> 
>>   };
>>   /**
>>
>