[PATCH 0/5] NVKM GSP RPC message handling policy

Ben Skeggs bskeggs at nvidia.com
Thu Feb 20 20:48:04 UTC 2025


On 8/2/25 03:58, Zhi Wang wrote:

> Ben reported an issue that the patch [1] breaks the suspend/resume.
>
> After digging for a while, I noticed that this problem had been there
> before introducing that patch, but not exposed because r535_gsp_rpc_push()
> doesn't repsect the caller's requirement when handling the large RPC
> command: It won't wait for the reply even the caller requires. (Small
> RPCs are fine.)
>
> After that patch series is introduced, r535_gsp_rpc_push() really waits
> for the reply and receives the entire GSP message, which is required
> by the large vGPU RPC command.
>
> There are currently two GSP RPC message handling policy:
>
> - a. dont care. discard the message before returning to the caller.
> - b. receive the entire message. wait and receive the entire message before
>    returning to the caller.
>
> On the path of suspend/resume, there is a large GSP command
> NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, which returns only a GSP RPC message
> header to tell the driver that the request is handled. The policy in the
> driver is to receive the entrie message, which ends up with a timeout
> and error when r535_gsp_rpc_push() tries to receive the message. That
> breaks the suspend/resume path.
>
> This series factors out the current GSP RPC message handling policy and
> introduces a new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY and a
> kernel doc to illustrate the policies.
>
> With this patchset, the problem can't be reproduced and suspend/resume
> works on my L40.

This seems to fix the issue here on top of current drm-misc-next.

Tested-by: Ben Skeggs <bskeggs at nvidia.com>

>
> [1] https://lore.kernel.org/nouveau/7eb31f1f-fc3a-4fb5-86cf-4bd011d68ff1@nvidia.com/T/#t
>
> Zhi Wang (5):
>    drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply()
>    drm/nouveau/nvkm: factor out the current RPC command reply policies
>    drm/nouveau/nvkm: introduce new GSP reply policy
>      NVKM_GSP_RPC_REPLY_POLL
>    drm/nouveau/nvkm: use the new policy for
>      NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY
>    drm/nouveau/nvkm: introduce a kernel doc for GSP message handling
>
>   Documentation/gpu/nouveau.rst                 |  3 +
>   .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 34 ++++++--
>   .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c    |  2 +-
>   .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 80 +++++++++++--------
>   .../drm/nouveau/nvkm/subdev/instmem/r535.c    |  2 +-
>   5 files changed, 78 insertions(+), 43 deletions(-)
>


More information about the Nouveau mailing list