[PATCH] nouveau/gsp: add a 50ms delay between fbsr and driver unload rpcs

Dave Airlie airlied at gmail.com
Wed Jul 2 23:27:07 UTC 2025


From: Dave Airlie <airlied at redhat.com>

This fixes a bunch of command hangs after runtime suspend/resume.

This fixes a regression caused by code movement in the commit below,
the commit seems to just change timings enough to cause this to happen
now, and adding the sleep seems to avoid it.

I've spent some time trying to root cause it to no great avail,
it seems like a bug on the firmware side, but it could be a bug
in our rpc handling that I can't find.

Either way, we should land the workaround to fix the problem,
while we continue to work out the root cause.

Signed-off-by: Dave Airlie <airlied at redhat.com>
Cc: Ben Skeggs <bskeggs at nvidia.com>
Cc: Danilo Krummrich <dakr at kernel.org>
Fixes: 21b039715ce9 ("drm/nouveau/gsp: add hals for fbsr.suspend/resume()")
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
index baf42339f93e..ff362a6d9f5c 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
@@ -1744,6 +1744,9 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
 			nvkm_gsp_sg_free(gsp->subdev.device, &gsp->sr.sgt);
 			return ret;
 		}
+
+		/* without this Turing ends up resetting all channels after resume. */
+		msleep(50);
 	}
 
 	ret = r535_gsp_rpc_unloading_guest_driver(gsp, suspend);
-- 
2.49.0



More information about the dri-devel mailing list