[PATCH v2] drm/msm: Update global fault counter when faulty process has already ended

Sun Aug 17 14:43:45 UTC 2025

The patch is in msm-fixes, but I was out last week so haven't had a
chance to send a PR yet

thx

BR,
-R

On Fri, Aug 15, 2025 at 1:10 PM Maíra Canal <mcanal at igalia.com> wrote:
>
> Hi,
>
> Gentle ping on this patch.
>
> Best Regards,
> - Maíra
>
> On 7/20/25 18:42, Maíra Canal wrote:
> > The global fault counter is no longer used since commit 12578c075f89
> > ("drm/msm/gpu: Skip retired submits in recover worker"). However, it's
> > still needed, as we need to handle cases where a GPU fault occurs after
> > the faulting process has already ended.
> >
> > Hence, increment the global fault counter when the submitting process
> > had already ended. This way, the number of faults returned by
> > MSM_PARAM_FAULTS will stay consistent.
> >
> > While here, s/unusuable/unusable.
> >
> > Fixes: 12578c075f89 ("drm/msm/gpu: Skip retired submits in recover worker")
> > Signed-off-by: Maíra Canal <mcanal at igalia.com>
> > ---
> >
> > v1 -> v2: https://lore.kernel.org/dri-devel/20250714230813.46279-1-mcanal@igalia.com/T/
> >
> > * Don't delete the global fault, but instead, increment it when the we get
> >       a fault after the faulting process has ended (Rob Clark)
> > * Rewrite the commit message based on the changes.
> >
> >   drivers/gpu/drm/msm/msm_gpu.c | 11 ++++++++---
> >   1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> > index c317b25a8162..416d47185ef0 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu.c
> > @@ -465,6 +465,7 @@ static void recover_worker(struct kthread_work *work)
> >       struct msm_gem_submit *submit;
> >       struct msm_ringbuffer *cur_ring = gpu->funcs->active_ring(gpu);
> >       char *comm = NULL, *cmd = NULL;
> > +     struct task_struct *task;
> >       int i;
> >
> >       mutex_lock(&gpu->lock);
> > @@ -482,16 +483,20 @@ static void recover_worker(struct kthread_work *work)
> >
> >       /* Increment the fault counts */
> >       submit->queue->faults++;
> > -     if (submit->vm) {
> > +
> > +     task = get_pid_task(submit->pid, PIDTYPE_PID);
> > +     if (!task)
> > +             gpu->global_faults++;
> > +     else {
> >               struct msm_gem_vm *vm = to_msm_vm(submit->vm);
> >
> >               vm->faults++;
> >
> >               /*
> >                * If userspace has opted-in to VM_BIND (and therefore userspace
> > -              * management of the VM), faults mark the VM as unusuable.  This
> > +              * management of the VM), faults mark the VM as unusable. This
> >                * matches vulkan expectations (vulkan is the main target for
> > -              * VM_BIND)
> > +              * VM_BIND).
> >                */
> >               if (!vm->managed)
> >                       msm_gem_vm_unusable(submit->vm);
>