[PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

Liu, Monk Monk.Liu at amd.com
Tue Jan 12 05:55:43 UTC 2021


[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Monk Liu <monk.liu at amd.com>

Thanks 

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------

-----Original Message-----
From: Zhang, Jack (Jian) <Jack.Zhang1 at amd.com> 
Sent: Tuesday, January 12, 2021 11:20 AM
To: Zhang, Hawking <Hawking.Zhang at amd.com>; amd-gfx at lists.freedesktop.org; Liu, Monk <Monk.Liu at amd.com>; Chen, JingWen <JingWen.Chen2 at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Deng, Emily <Emily.Deng at amd.com>
Subject: RE: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

[AMD Official Use Only - Internal Distribution Only]

Ping...

-----Original Message-----
From: Zhang, Jack (Jian)
Sent: Friday, January 8, 2021 11:07 AM
To: amd-gfx at lists.freedesktop.org; Liu, Monk <Monk.Liu at amd.com>; Chen, JingWen <JingWen.Chen2 at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Deng, Emily <Emily.Deng at amd.com>
Subject: RE: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

Ping

-----Original Message-----
From: Jack Zhang <Jack.Zhang1 at amd.com>
Sent: Thursday, January 7, 2021 6:47 PM
To: amd-gfx at lists.freedesktop.org
Cc: Zhang, Jack (Jian) <Jack.Zhang1 at amd.com>; Zhang, Jack (Jian) <Jack.Zhang1 at amd.com>; Chen, JingWen <JingWen.Chen2 at amd.com>
Subject: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

[Why]
When host trigger a whole gpu reset, guest will keep waiting till host finish reset. But there's a work queue in guest exchanging data between vf&pf which need to access frame buffer. During whole gpu reset, frame buffer is not accessable, and this causes the call trace.

[How]
After vf get reset notification from pf, stop data exchange.

Signed-off-by: Jingwen Chen <Jingwen.Chen2 at amd.com>
Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c    | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 83ca5cbffe2c..3e212862cf5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -571,6 +571,7 @@ void amdgpu_virt_fini_data_exchange(struct amdgpu_device *adev)  DRM_INFO("clean up the vf2pf work item\n");  flush_delayed_work(&adev->virt.vf2pf_work);
 cancel_delayed_work_sync(&adev->virt.vf2pf_work);
+adev->virt.vf2pf_update_interval_ms = 0;
 }
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 7767ccca526b..3ee481557fc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -255,6 +255,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)  if (!down_read_trylock(&adev->reset_sem))
 return;

+amdgpu_virt_fini_data_exchange(adev);
 atomic_set(&adev->in_gpu_reset, 1);

 do {
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index dd5c1e6ce009..48e588d3c409 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -276,6 +276,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)  if (!down_read_trylock(&adev->reset_sem))
 return;

+amdgpu_virt_fini_data_exchange(adev);
 atomic_set(&adev->in_gpu_reset, 1);

 do {
--
2.25.1



More information about the amd-gfx mailing list