[PATCH v2 6/8] drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Maíra Canal
mcanal at igalia.com
Mon Jun 2 11:36:24 UTC 2025
Hi Philipp,
On 02/06/25 04:28, Philipp Stanner wrote:
> On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:
[...]
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> index
>> 7146069a98492f5fab2a49d96e2054f649e1fe3d..46f5391e84a12232b247886cf13
>> 11f8e09f42f04 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> @@ -40,11 +40,11 @@ static enum drm_gpu_sched_stat
>> etnaviv_sched_timedout_job(struct drm_sched_job
>> int change;
>>
>> /*
>> - * If the GPU managed to complete this jobs fence, the
>> timout is
>> - * spurious. Bail out.
>> + * If the GPU managed to complete this jobs fence, the
>> timeout has
>> + * fired before free-job worker. The timeout is spurious, so
>> bail out.
>> */
>> if (dma_fence_is_signaled(submit->out_fence))
>> - goto out_no_timeout;
>> + return DRM_GPU_SCHED_STAT_NO_HANG;
>>
>> /*
>> * If the GPU is still making forward progress on the front-
>> end (which
>> @@ -70,7 +70,7 @@ static enum drm_gpu_sched_stat
>> etnaviv_sched_timedout_job(struct drm_sched_job
>> gpu->hangcheck_dma_addr = dma_addr;
>> gpu->hangcheck_primid = primid;
>> gpu->hangcheck_fence = gpu->completed_fence;
>> - goto out_no_timeout;
>> + return DRM_GPU_SCHED_STAT_NO_HANG;
>> }
>>
>> /* block scheduler */
>> @@ -86,10 +86,7 @@ static enum drm_gpu_sched_stat
>> etnaviv_sched_timedout_job(struct drm_sched_job
>> drm_sched_resubmit_jobs(&gpu->sched);
>>
>> drm_sched_start(&gpu->sched, 0);
>> - return DRM_GPU_SCHED_STAT_RESET;
>>
>> -out_no_timeout:
>> - list_add(&sched_job->list, &sched_job->sched->pending_list);
>
> Here you actually remove the manipulation of the scheduler internals,
> but you didn't in v3d. Just to point that out.
>
>
> And BTW I'm just seeing that the pending_list gets manipulated here
> with the scheduler's workqueues running and no locks being hold.
>
> Oh man :(
>
> That is most certainly a bug, and I recommend that the etnaviv
> maintainers at least add the appropriate lock here and backport that
> since it can race any time.
>
>
> But thx for working on that, Maíra. Good that we can remove the stuff
> this way.
>
> Thinking about it, this patch even fixes a bug. So could contain a
> Fixes: tag. But I'm not sure if it's worth it to mark the entire series
> for Stable. Opinions?
I believe the best way to fix this bug would be doing something similar
to what I did to v3d: send a temporary fix adding the locks, which will
be backported to stable. I'll send a fix to Etnaviv today.
Thanks for the review, Phillip!
Best Regards,
- Maíra
>
>
> P.
>
>
>> return DRM_GPU_SCHED_STAT_RESET;
>> }
>>
>>
>
More information about the Intel-xe
mailing list