[PATCH] nouveau/gsp: add a 50ms delay between fbsr and driver unload rpcs

David Airlie airlied at redhat.com
Thu Jul 3 21:56:30 UTC 2025


On Fri, Jul 4, 2025 at 7:46 AM Danilo Krummrich <dakr at kernel.org> wrote:
>
> On 7/3/25 1:27 AM, Dave Airlie wrote:
> > From: Dave Airlie <airlied at redhat.com>
> >
> > This fixes a bunch of command hangs after runtime suspend/resume.
> >
> > This fixes a regression caused by code movement in the commit below,
> > the commit seems to just change timings enough to cause this to happen
> > now, and adding the sleep seems to avoid it.
> >
> > I've spent some time trying to root cause it to no great avail,
> > it seems like a bug on the firmware side, but it could be a bug
> > in our rpc handling that I can't find.
> >
> > Either way, we should land the workaround to fix the problem,
> > while we continue to work out the root cause.
>
> I think we should add a TODO above the msleep(); what do you think would be a
> good comment here?

TODO: debug the gsp firmware or the rpc handling to find out why this
is happening and why it's Turing specific.

Don't really have a lot to go on,

Dave.
>
> I can add it when applying the patch if you want.
>
> > Signed-off-by: Dave Airlie <airlied at redhat.com>
> > Cc: Ben Skeggs <bskeggs at nvidia.com>
> > Cc: Danilo Krummrich <dakr at kernel.org>
> > Fixes: 21b039715ce9 ("drm/nouveau/gsp: add hals for fbsr.suspend/resume()")
> > ---
> >   drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
> > index baf42339f93e..ff362a6d9f5c 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
> > @@ -1744,6 +1744,9 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
> >                       nvkm_gsp_sg_free(gsp->subdev.device, &gsp->sr.sgt);
> >                       return ret;
> >               }
> > +
> > +             /* without this Turing ends up resetting all channels after resume. */
> > +             msleep(50);
> >       }
> >
> >       ret = r535_gsp_rpc_unloading_guest_driver(gsp, suspend);
>



More information about the dri-devel mailing list