[PATCH v2 6/8] drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Philipp Stanner
phasta at mailbox.org
Mon Jun 2 07:28:19 UTC 2025
On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
> Etnaviv can skip a hardware reset in two situations:
>
> 1. TDR has fired before the free-job worker and the timeout is
> spurious.
> 2. The GPU is still making progress on the front-end and we can
> give
> the job a chance to complete.
>
> Instead of relying on the scheduler internals, use the
> DRM_GPU_SCHED_STAT_NO_HANG status to skip the reset and re-arm the
In the four patches adjusting the drivers, I rather recommend to write:
"Instead of manipulating scheduler internals, inform the scheduler that
this job did not actually time out and no reset was performed through
the new status code DRM_GPU_SCHED_STAT_NO_HANG."
> timer.
>
> Signed-off-by: Maíra Canal <mcanal at igalia.com>
> ---
> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index
> 7146069a98492f5fab2a49d96e2054f649e1fe3d..46f5391e84a12232b247886cf13
> 11f8e09f42f04 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -40,11 +40,11 @@ static enum drm_gpu_sched_stat
> etnaviv_sched_timedout_job(struct drm_sched_job
> int change;
>
> /*
> - * If the GPU managed to complete this jobs fence, the
> timout is
> - * spurious. Bail out.
> + * If the GPU managed to complete this jobs fence, the
> timeout has
> + * fired before free-job worker. The timeout is spurious, so
> bail out.
> */
> if (dma_fence_is_signaled(submit->out_fence))
> - goto out_no_timeout;
> + return DRM_GPU_SCHED_STAT_NO_HANG;
>
> /*
> * If the GPU is still making forward progress on the front-
> end (which
> @@ -70,7 +70,7 @@ static enum drm_gpu_sched_stat
> etnaviv_sched_timedout_job(struct drm_sched_job
> gpu->hangcheck_dma_addr = dma_addr;
> gpu->hangcheck_primid = primid;
> gpu->hangcheck_fence = gpu->completed_fence;
> - goto out_no_timeout;
> + return DRM_GPU_SCHED_STAT_NO_HANG;
> }
>
> /* block scheduler */
> @@ -86,10 +86,7 @@ static enum drm_gpu_sched_stat
> etnaviv_sched_timedout_job(struct drm_sched_job
> drm_sched_resubmit_jobs(&gpu->sched);
>
> drm_sched_start(&gpu->sched, 0);
> - return DRM_GPU_SCHED_STAT_RESET;
>
> -out_no_timeout:
> - list_add(&sched_job->list, &sched_job->sched->pending_list);
Here you actually remove the manipulation of the scheduler internals,
but you didn't in v3d. Just to point that out.
And BTW I'm just seeing that the pending_list gets manipulated here
with the scheduler's workqueues running and no locks being hold.
Oh man :(
That is most certainly a bug, and I recommend that the etnaviv
maintainers at least add the appropriate lock here and backport that
since it can race any time.
But thx for working on that, Maíra. Good that we can remove the stuff
this way.
Thinking about it, this patch even fixes a bug. So could contain a
Fixes: tag. But I'm not sure if it's worth it to mark the entire series
for Stable. Opinions?
P.
> return DRM_GPU_SCHED_STAT_RESET;
> }
>
>
More information about the dri-devel
mailing list