[PATCH] drm/v3d: Add job to pending list if the reset was skipped
Iago Toral
itoral at igalia.com
Tue Apr 29 11:00:36 UTC 2025
Never mind, somehow I missed you're already doing that in this patch.
Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
El lun, 28-04-2025 a las 07:45 +0200, Iago Toral escribió:
> Hi Maira,
>
> Looks good to me, but don't we need to do the same in
> v3d_csd_job_timedout?
>
> Iago
>
> El dom, 27-04-2025 a las 17:28 -0300, Maíra Canal escribió:
> > When a CL/CSD job times out, we check if the GPU has made any
> > progress
> > since the last timeout. If so, instead of resetting the hardware,
> > we
> > skip
> > the reset and let the timer get rearmed. This gives long-running
> > jobs
> > a
> > chance to complete.
> >
> > However, when `timedout_job()` is called, the job in question is
> > removed
> > from the pending list, which means it won't be automatically freed
> > through
> > `free_job()`. Consequently, when we skip the reset and keep the job
> > running, the job won't be freed when it finally completes.
> >
> > This situation leads to a memory leak, as exposed in [1].
> >
> > Similarly to commit 704d3d60fec4 ("drm/etnaviv: don't block
> > scheduler
> > when
> > GPU is still active"), this patch ensures the job is put back on
> > the
> > pending list when extending the timeout.
> >
> > Cc: stable at vger.kernel.org # 6.0
> > Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227 [1]
> > Reported-by: Daivik Bhatia <dtgs1208 at gmail.com>
> > Signed-off-by: Maíra Canal <mcanal at igalia.com>
> > ---
> > drivers/gpu/drm/v3d/v3d_sched.c | 18 +++++++++++-------
> > 1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
> > b/drivers/gpu/drm/v3d/v3d_sched.c
> > index b3be08b0ca91..a98dcf9d75b1 100644
> > --- a/drivers/gpu/drm/v3d/v3d_sched.c
> > +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> > @@ -734,11 +734,6 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
> > struct drm_sched_job *sched_job)
> > return DRM_GPU_SCHED_STAT_NOMINAL;
> > }
> >
> > -/* If the current address or return address have changed, then the
> > GPU
> > - * has probably made progress and we should delay the reset. This
> > - * could fail if the GPU got in an infinite loop in the CL, but
> > that
> > - * is pretty unlikely outside of an i-g-t testcase.
> > - */
> > static enum drm_gpu_sched_stat
> > v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum
> > v3d_queue
> > q,
> > u32 *timedout_ctca, u32 *timedout_ctra)
> > @@ -748,9 +743,16 @@ v3d_cl_job_timedout(struct drm_sched_job
> > *sched_job, enum v3d_queue q,
> > u32 ctca = V3D_CORE_READ(0, V3D_CLE_CTNCA(q));
> > u32 ctra = V3D_CORE_READ(0, V3D_CLE_CTNRA(q));
> >
> > + /* If the current address or return address have changed,
> > then the GPU
> > + * has probably made progress and we should delay the
> > reset.
> > This
> > + * could fail if the GPU got in an infinite loop in the
> > CL,
> > but that
> > + * is pretty unlikely outside of an i-g-t testcase.
> > + */
> > if (*timedout_ctca != ctca || *timedout_ctra != ctra) {
> > *timedout_ctca = ctca;
> > *timedout_ctra = ctra;
> > +
> > + list_add(&sched_job->list, &sched_job->sched-
> > > pending_list);
> > return DRM_GPU_SCHED_STAT_NOMINAL;
> > }
> >
> > @@ -790,11 +792,13 @@ v3d_csd_job_timedout(struct drm_sched_job
> > *sched_job)
> > struct v3d_dev *v3d = job->base.v3d;
> > u32 batches = V3D_CORE_READ(0, V3D_CSD_CURRENT_CFG4(v3d-
> > > ver));
> >
> > - /* If we've made progress, skip reset and let the timer
> > get
> > - * rearmed.
> > + /* If we've made progress, skip reset, add the job to the
> > pending
> > + * list, and let the timer get rearmed.
> > */
> > if (job->timedout_batches != batches) {
> > job->timedout_batches = batches;
> > +
> > + list_add(&sched_job->list, &sched_job->sched-
> > > pending_list);
> > return DRM_GPU_SCHED_STAT_NOMINAL;
> > }
> >
>
More information about the dri-devel
mailing list