[PATCH] amdgpu/sriov Stop data exchange for wholegpu reset
Paul Menzel
pmenzel at molgen.mpg.de
Tue Jan 12 09:16:59 UTC 2021
Dear Jack,
Thank you for your patch.
Please add a colon after amdgpu/sriov in the commit message summary.
Am 07.01.21 um 11:46 schrieb Jack Zhang:
> [Why]
> When host trigger a whole gpu reset, guest will keep
*hosts trigger* or *host triggers*
> waiting till host finish reset. But there's a work
finishes
> queue in guest exchanging data between vf&pf which need
needs
> to access frame buffer. During whole gpu reset, frame
> buffer is not accessable, and this causes the call trace.
accessible (a spell checker should have caught that)
Can you please paste part of the trace, so it’s easily findable by users
running into this.
> [How]
> After vf get reset notification from pf, stop data exchange.
How can this be reproduced and tested?
Kind regards,
Paul
> Signed-off-by: Jingwen Chen <Jingwen.Chen2 at amd.com>
> Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 1 +
> 3 files changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 83ca5cbffe2c..3e212862cf5d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -571,6 +571,7 @@ void amdgpu_virt_fini_data_exchange(struct amdgpu_device *adev)
> DRM_INFO("clean up the vf2pf work item\n");
> flush_delayed_work(&adev->virt.vf2pf_work);
> cancel_delayed_work_sync(&adev->virt.vf2pf_work);
> + adev->virt.vf2pf_update_interval_ms = 0;
> }
> }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> index 7767ccca526b..3ee481557fc9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> @@ -255,6 +255,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
> if (!down_read_trylock(&adev->reset_sem))
> return;
>
> + amdgpu_virt_fini_data_exchange(adev);
> atomic_set(&adev->in_gpu_reset, 1);
>
> do {
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
> index dd5c1e6ce009..48e588d3c409 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
> @@ -276,6 +276,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
> if (!down_read_trylock(&adev->reset_sem))
> return;
>
> + amdgpu_virt_fini_data_exchange(adev);
> atomic_set(&adev->in_gpu_reset, 1);
>
> do {
>
More information about the amd-gfx
mailing list