[PATCH v2] drm/v3d: Add job to pending list if the reset was skipped
Maíra Canal
mcanal at igalia.com
Fri May 2 19:42:57 UTC 2025
Hi,
On 30/04/25 17:51, Maíra Canal wrote:
> When a CL/CSD job times out, we check if the GPU has made any progress
> since the last timeout. If so, instead of resetting the hardware, we skip
> the reset and let the timer get rearmed. This gives long-running jobs a
> chance to complete.
>
> However, when `timedout_job()` is called, the job in question is removed
> from the pending list, which means it won't be automatically freed through
> `free_job()`. Consequently, when we skip the reset and keep the job
> running, the job won't be freed when it finally completes.
>
> This situation leads to a memory leak, as exposed in [1] and [2].
>
> Similarly to commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when
> GPU is still active"), this patch ensures the job is put back on the
> pending list when extending the timeout.
>
> Cc: stable at vger.kernel.org # 6.0
> Reported-by: Daivik Bhatia <dtgs1208 at gmail.com>
> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227 [1]
> Closes: https://github.com/raspberrypi/linux/issues/6817 [2]
> Signed-off-by: Maíra Canal <mcanal at igalia.com>
> Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
> ---
>
> Hi,
>
> While we typically strive to avoid exposing the scheduler's internals
> within the drivers, I'm proposing this fix as an interim solution. I'm aware
> that a comprehensive fix will need some adjustments on the DRM sched side,
> and I plan to address that soon.
>
> However, it would be hard to justify the backport of such patches to the
> stable branches and this issue is affecting users in the moment.
> Therefore, I'd like to push this patch to drm-misc-fixes in order to
> address this leak as soon as possible, while working in a more generic
> solution.
Applied to misc/kernel.git (drm-misc-fixes).
Best Regards,
- Maíra
More information about the dri-devel
mailing list