[PATCH] drm/amdgpu: limit GDS clearing workaround in cold boot sequence

Mon Feb 10 09:21:55 UTC 2020

Am 10.02.20 um 09:33 schrieb Guchun Chen:
> GDS clear workaround will cause gfx failure in suspend/resume case.
>
> [   98.679559] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <gfx_v9_0> failed -110
> [   98.679561] PM: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
> [   98.679562] PM: Device 0000:03:00.0 failed to resume async: error -110
>
> As this workaround is specific to the HW bug of GDS's ECC error
> existing in cold boot up, so bypass this workaround in suspend/
> resume case after booting up.

Mhm, why doesn't this also apply for suspend/resume?

I mean the hardware is usually turned off which is equivalent to a cold 
boot up?

Christian.

>
> Signed-off-by: Guchun Chen <guchun.chen at amd.com>
> Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index de59defa91eb..33f282ff245f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -4320,9 +4320,12 @@ static int gfx_v9_0_ecc_late_init(void *handle)
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   	int r;
>   
> -	r = gfx_v9_0_do_edc_gds_workarounds(adev);
> -	if (r)
> -		return r;
> +	/* limit gds clearing operation in cold boot sequence */
> +	if (!adev->in_suspend) {
> +		r = gfx_v9_0_do_edc_gds_workarounds(adev);
> +		if (r)
> +			return r;
> +	}
>   
>   	/* requires IBs so do in late init after IB pool is initialized */
>   	r = gfx_v9_0_do_edc_gpr_workarounds(adev);