[PATCH] nouveau/gsp: add a 50ms delay between fbsr and driver unload rpcs

Danilo Krummrich dakr at kernel.org
Thu Jul 3 22:22:42 UTC 2025


On Thu, Jul 03, 2025 at 09:27:07AM +1000, Dave Airlie wrote:
> From: Dave Airlie <airlied at redhat.com>
> 
> This fixes a bunch of command hangs after runtime suspend/resume.
> 
> This fixes a regression caused by code movement in the commit below,
> the commit seems to just change timings enough to cause this to happen
> now, and adding the sleep seems to avoid it.
> 
> I've spent some time trying to root cause it to no great avail,
> it seems like a bug on the firmware side, but it could be a bug
> in our rpc handling that I can't find.
> 
> Either way, we should land the workaround to fix the problem,
> while we continue to work out the root cause.
> 
> Signed-off-by: Dave Airlie <airlied at redhat.com>
> Cc: Ben Skeggs <bskeggs at nvidia.com>
> Cc: Danilo Krummrich <dakr at kernel.org>
> Fixes: 21b039715ce9 ("drm/nouveau/gsp: add hals for fbsr.suspend/resume()")

Applied to drm-misc-fixes with the following diff.

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
index ff362a6d9f5c..23f80e167705 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
@@ -1745,7 +1745,11 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
                        return ret;
                }

-               /* without this Turing ends up resetting all channels after resume. */
+               /*
+                * TODO: Debug the GSP firmware / RPC handling to find out why
+                * without this Turing (but none of the other architectures)
+                * ends up resetting all channels after resume.
+                */
                msleep(50);
        }

I also changed the 'Fixes' tag to:

Fixes: c21b039715ce ("drm/nouveau/gsp: add hals for fbsr.suspend/resume()")


More information about the Nouveau mailing list