[PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled

Christian König ckoenig.leichtzumerken at gmail.com
Fri Jun 1 09:13:49 UTC 2018


Am 01.06.2018 um 08:41 schrieb Huang Rui:
> After defer the execution of gfx/compute ib tests. However, at that time, the
> gfx already go into "mid state" of gfxoff.
>
> PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits)
> 0 = GFXOFF.
> 1 = Transition out of GFXOFF state.
> 2 = Not in GFXOFF.
> 3 = Transition into GFXOFF.
>
> If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the
> gfx back successfully. And the field value is 1 when we issue the ib test at
> that, so we got the hang. This is the root cause that we encountered the issue.
>
> Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state.
> So here we should move the gfx powergating and gfxoff enabling behavior at the
> end of initialization behind ib test and clockgating.

Mhm, that still looks like a only halve backed solution:

1. What prevents this bug from happening during "normal" IB submission 
from userspace?

2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we 
are not in any transition phase instead?

Regards,
Christian.

>
> Signed-off-by: Huang Rui <ray.huang at amd.com>
> Cc: Hawking Zhang <Hawking.Zhang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c        | 10 ++++++++++
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c             |  5 -----
>   drivers/gpu/drm/amd/powerplay/amd_powerplay.c     |  2 +-
>   drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c |  4 ++--
>   4 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f509d32..e1c8806 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1723,6 +1723,16 @@ static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev)
>   			}
>   		}
>   	}
> +
> +	if (adev->powerplay.pp_feature & PP_GFXOFF_MASK) {
> +		amdgpu_device_ip_set_powergating_state(adev,
> +						       AMD_IP_BLOCK_TYPE_GFX,
> +						       AMD_CG_STATE_GATE);
> +		amdgpu_device_ip_set_powergating_state(adev,
> +						       AMD_IP_BLOCK_TYPE_SMC,
> +						       AMD_CG_STATE_GATE);
> +	}
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 2c5e2a4..31ecc86 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3358,11 +3358,6 @@ static int gfx_v9_0_late_init(void *handle)
>   	if (r)
>   		return r;
>   
> -	r = amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_GFX,
> -						   AMD_PG_STATE_GATE);
> -	if (r)
> -		return r;
> -
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> index b493369..d0e6e2d 100644
> --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> @@ -245,7 +245,7 @@ static int pp_set_powergating_state(void *handle,
>   	}
>   
>   	if (hwmgr->hwmgr_func->enable_per_cu_power_gating == NULL) {
> -		pr_info("%s was not implemented.\n", __func__);
> +		pr_debug("%s was not implemented.\n", __func__);
>   		return 0;
>   	}
>   
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
> index 7712eb6..b72d089 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
> @@ -284,7 +284,7 @@ static int smu10_disable_gfx_off(struct pp_hwmgr *hwmgr)
>   
>   static int smu10_disable_dpm_tasks(struct pp_hwmgr *hwmgr)
>   {
> -	return smu10_disable_gfx_off(hwmgr);
> +	return 0;
>   }
>   
>   static int smu10_enable_gfx_off(struct pp_hwmgr *hwmgr)
> @@ -299,7 +299,7 @@ static int smu10_enable_gfx_off(struct pp_hwmgr *hwmgr)
>   
>   static int smu10_enable_dpm_tasks(struct pp_hwmgr *hwmgr)
>   {
> -	return smu10_enable_gfx_off(hwmgr);
> +	return 0;
>   }
>   
>   static int smu10_gfx_off_control(struct pp_hwmgr *hwmgr, bool enable)



More information about the amd-gfx mailing list