[PATCH] drm/amdgpu: Fix SDMA engine resume issue under SRIOV

Bokun Zhang Bokun.Zhang at amd.com
Thu Oct 6 18:08:38 UTC 2022


- Under SRIOV, SDMA engine is shared between VFs. Therefore,
  we will not stop SDMA during hw_fini. This is not an issue
  with normal dirver loading and unloading.

- However, when we put the SDMA engine to suspend state and resume
  it, the issue starts to show up. Something could attempt to use
  that SDMA engine to clear or move memory before the engine is
  initialized since the DRM entity is still there.

- Therefore, we will call sdma_v5_2_enable(false) during hw_fini,
  and if we are under SRIOV, we will call sdma_v5_2_enable(true)
  afterwards to allow other VFs to use SDMA. This way, the DRM
  entity of SDMA engine is emptied and it will follow the flow
  of resume code path.

Signed-off-by: Bokun Zhang <Bokun.Zhang at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index f136fec7b4f4..3eaf1a573e73 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -1357,12 +1357,19 @@ static int sdma_v5_2_hw_fini(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-	if (amdgpu_sriov_vf(adev))
-		return 0;
-
+	/*
+	 * Under SRIOV, the VF cannot single-mindedly stop SDMA engine
+	 * However, we still need to clean up the DRM entity
+	 * Therefore, we will re-enable SDMA afterwards.
+	 */
 	sdma_v5_2_ctx_switch_enable(adev, false);
 	sdma_v5_2_enable(adev, false);
 
+	if (amdgpu_sriov_vf(adev)) {
+		sdma_v5_2_enable(adev, true);
+		sdma_v5_2_ctx_switch_enable(adev, true);
+	}
+
 	return 0;
 }
 
-- 
2.34.1



More information about the amd-gfx mailing list