[PATCH] drm/amdgpu: Fix SDMA engine resume issue under SRIOV

Liu, Monk Monk.Liu at amd.com
Wed Oct 12 01:43:59 UTC 2022


[AMD Official Use Only - General]

Hi Bokun

Can you elaborate more on this reenabling SDMA engine during suspend ?
Why VF need the SDMA engine alive there ?

> -
> +       /*
> +        * Under SRIOV, the VF cannot single-mindedly stop SDMA engine
> +        * However, we still need to clean up the DRM entity
> +        * Therefore, we will re-enable SDMA afterwards.
> +        */

Thanks 
-------------------------------------------------------------------
Monk Liu | Cloud GPU & Virtualization Solution | AMD
-------------------------------------------------------------------
we are hiring software manager for CVS core team
-------------------------------------------------------------------

-----Original Message-----
From: Zhang, Bokun <Bokun.Zhang at amd.com> 
Sent: 2022年10月8日 5:38
To: Alex Deucher <alexdeucher at gmail.com>
Cc: Liu, Monk <Monk.Liu at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Deng, Emily <Emily.Deng at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org; Jiang, Jerry (SW) <Jerry.Jiang at amd.com>
Subject: RE: [PATCH] drm/amdgpu: Fix SDMA engine resume issue under SRIOV

[AMD Official Use Only - General]

Tested-by: Bokun, Zhang <Bokun.Zhang at amd.com>

This patch is better since it extracted the unset code and only execute it in the SRIOV routine.
I have tested it with multi-VF.

Thanks!


-----Original Message-----
From: Alex Deucher <alexdeucher at gmail.com> 
Sent: Thursday, October 6, 2022 3:56 PM
To: Zhang, Bokun <Bokun.Zhang at amd.com>
Cc: Liu, Monk <Monk.Liu at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Deng, Emily <Emily.Deng at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix SDMA engine resume issue under SRIOV

On Thu, Oct 6, 2022 at 2:11 PM Zhang, Bokun <Bokun.Zhang at amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> Hey guys,
>     Please help review this patch for the suspend and resume issue.
>     I have tested it with multi-VF environment, I think it is ok.

Seems a little hacky, but I think that's the least intrusive for stable.  How about the attached patches?

Alex


>
> Thanks!
>
> -----Original Message-----
> From: Bokun Zhang <Bokun.Zhang at amd.com>
> Sent: Thursday, October 6, 2022 2:09 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Zhang, Bokun <Bokun.Zhang at amd.com>
> Subject: [PATCH] drm/amdgpu: Fix SDMA engine resume issue under SRIOV
>
> - Under SRIOV, SDMA engine is shared between VFs. Therefore,
>   we will not stop SDMA during hw_fini. This is not an issue
>   with normal dirver loading and unloading.
>
> - However, when we put the SDMA engine to suspend state and resume
>   it, the issue starts to show up. Something could attempt to use
>   that SDMA engine to clear or move memory before the engine is
>   initialized since the DRM entity is still there.
>
> - Therefore, we will call sdma_v5_2_enable(false) during hw_fini,
>   and if we are under SRIOV, we will call sdma_v5_2_enable(true)
>   afterwards to allow other VFs to use SDMA. This way, the DRM
>   entity of SDMA engine is emptied and it will follow the flow
>   of resume code path.
>
> Signed-off-by: Bokun Zhang <Bokun.Zhang at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> index f136fec7b4f4..3eaf1a573e73 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> @@ -1357,12 +1357,19 @@ static int sdma_v5_2_hw_fini(void *handle)  {
>         struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> -       if (amdgpu_sriov_vf(adev))
> -               return 0;
> -
> +       /*
> +        * Under SRIOV, the VF cannot single-mindedly stop SDMA engine
> +        * However, we still need to clean up the DRM entity
> +        * Therefore, we will re-enable SDMA afterwards.
> +        */
>         sdma_v5_2_ctx_switch_enable(adev, false);
>         sdma_v5_2_enable(adev, false);
>
> +       if (amdgpu_sriov_vf(adev)) {
> +               sdma_v5_2_enable(adev, true);
> +               sdma_v5_2_ctx_switch_enable(adev, true);
> +       }
> +
>         return 0;
>  }
>
> --
> 2.34.1


More information about the amd-gfx mailing list