[PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled
Christian König
ckoenig.leichtzumerken at gmail.com
Fri Jun 1 09:13:49 UTC 2018
Am 01.06.2018 um 08:41 schrieb Huang Rui:
> After defer the execution of gfx/compute ib tests. However, at that time, the
> gfx already go into "mid state" of gfxoff.
>
> PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits)
> 0 = GFXOFF.
> 1 = Transition out of GFXOFF state.
> 2 = Not in GFXOFF.
> 3 = Transition into GFXOFF.
>
> If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the
> gfx back successfully. And the field value is 1 when we issue the ib test at
> that, so we got the hang. This is the root cause that we encountered the issue.
>
> Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state.
> So here we should move the gfx powergating and gfxoff enabling behavior at the
> end of initialization behind ib test and clockgating.
Mhm, that still looks like a only halve backed solution:
1. What prevents this bug from happening during "normal" IB submission
from userspace?
2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we
are not in any transition phase instead?
Regards,
Christian.
>
> Signed-off-by: Huang Rui <ray.huang at amd.com>
> Cc: Hawking Zhang <Hawking.Zhang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 5 -----
> drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 2 +-
> drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c | 4 ++--
> 4 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f509d32..e1c8806 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1723,6 +1723,16 @@ static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev)
> }
> }
> }
> +
> + if (adev->powerplay.pp_feature & PP_GFXOFF_MASK) {
> + amdgpu_device_ip_set_powergating_state(adev,
> + AMD_IP_BLOCK_TYPE_GFX,
> + AMD_CG_STATE_GATE);
> + amdgpu_device_ip_set_powergating_state(adev,
> + AMD_IP_BLOCK_TYPE_SMC,
> + AMD_CG_STATE_GATE);
> + }
> +
> return 0;
> }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 2c5e2a4..31ecc86 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3358,11 +3358,6 @@ static int gfx_v9_0_late_init(void *handle)
> if (r)
> return r;
>
> - r = amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_GFX,
> - AMD_PG_STATE_GATE);
> - if (r)
> - return r;
> -
> return 0;
> }
>
> diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> index b493369..d0e6e2d 100644
> --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> @@ -245,7 +245,7 @@ static int pp_set_powergating_state(void *handle,
> }
>
> if (hwmgr->hwmgr_func->enable_per_cu_power_gating == NULL) {
> - pr_info("%s was not implemented.\n", __func__);
> + pr_debug("%s was not implemented.\n", __func__);
> return 0;
> }
>
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
> index 7712eb6..b72d089 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c
> @@ -284,7 +284,7 @@ static int smu10_disable_gfx_off(struct pp_hwmgr *hwmgr)
>
> static int smu10_disable_dpm_tasks(struct pp_hwmgr *hwmgr)
> {
> - return smu10_disable_gfx_off(hwmgr);
> + return 0;
> }
>
> static int smu10_enable_gfx_off(struct pp_hwmgr *hwmgr)
> @@ -299,7 +299,7 @@ static int smu10_enable_gfx_off(struct pp_hwmgr *hwmgr)
>
> static int smu10_enable_dpm_tasks(struct pp_hwmgr *hwmgr)
> {
> - return smu10_enable_gfx_off(hwmgr);
> + return 0;
> }
>
> static int smu10_gfx_off_control(struct pp_hwmgr *hwmgr, bool enable)
More information about the amd-gfx
mailing list