[PATCH 0/5] NVKM GSP RPC message handling policy
Zhi Wang
zhiw at nvidia.com
Fri Feb 7 17:58:01 UTC 2025
Ben reported an issue that the patch [1] breaks the suspend/resume.
After digging for a while, I noticed that this problem had been there
before introducing that patch, but not exposed because r535_gsp_rpc_push()
doesn't repsect the caller's requirement when handling the large RPC
command: It won't wait for the reply even the caller requires. (Small
RPCs are fine.)
After that patch series is introduced, r535_gsp_rpc_push() really waits
for the reply and receives the entire GSP message, which is required
by the large vGPU RPC command.
There are currently two GSP RPC message handling policy:
- a. dont care. discard the message before returning to the caller.
- b. receive the entire message. wait and receive the entire message before
returning to the caller.
On the path of suspend/resume, there is a large GSP command
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, which returns only a GSP RPC message
header to tell the driver that the request is handled. The policy in the
driver is to receive the entrie message, which ends up with a timeout
and error when r535_gsp_rpc_push() tries to receive the message. That
breaks the suspend/resume path.
This series factors out the current GSP RPC message handling policy and
introduces a new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY and a
kernel doc to illustrate the policies.
With this patchset, the problem can't be reproduced and suspend/resume
works on my L40.
[1] https://lore.kernel.org/nouveau/7eb31f1f-fc3a-4fb5-86cf-4bd011d68ff1@nvidia.com/T/#t
Zhi Wang (5):
drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply()
drm/nouveau/nvkm: factor out the current RPC command reply policies
drm/nouveau/nvkm: introduce new GSP reply policy
NVKM_GSP_RPC_REPLY_POLL
drm/nouveau/nvkm: use the new policy for
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY
drm/nouveau/nvkm: introduce a kernel doc for GSP message handling
Documentation/gpu/nouveau.rst | 3 +
.../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 34 ++++++--
.../gpu/drm/nouveau/nvkm/subdev/bar/r535.c | 2 +-
.../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 80 +++++++++++--------
.../drm/nouveau/nvkm/subdev/instmem/r535.c | 2 +-
5 files changed, 78 insertions(+), 43 deletions(-)
--
2.43.5
More information about the Nouveau
mailing list