[PATCH] drm/amdgpu: Use driver mode reset for data poison handling

Zhang, Hawking Hawking.Zhang at amd.com
Tue Apr 16 05:51:53 UTC 2024


[AMD Official Use Only - General]

Please ignore this one, will send out a new one

-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1 at amd.com>
Sent: Tuesday, April 16, 2024 01:08
To: Zhang, Hawking <Hawking.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: RE: [PATCH] drm/amdgpu: Use driver mode reset for data poison handling

[AMD Official Use Only - General]

Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>

> -----Original Message-----
> From: Hawking Zhang <Hawking.Zhang at amd.com>
> Sent: Tuesday, April 16, 2024 12:34 PM
> To: amd-gfx at lists.freedesktop.org; Zhou1, Tao <Tao.Zhou1 at amd.com>
> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>
> Subject: [PATCH] drm/amdgpu: Use driver mode reset for data poison
> handling
>
> mode-2 reset is the only reliable method that can get GC/SDMA back
> when poison is consumed. mmhub requires
> mode-1 reset.
>
> Signed-off-by: Hawking Zhang <Hawking.Zhang at amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index c368c70df3f4a..b6caf6eda8a0c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -163,17 +163,13 @@ static void
> event_interrupt_poison_consumption_v9(struct kfd_node *dev,
>       case SOC15_IH_CLIENTID_SE2SH:
>       case SOC15_IH_CLIENTID_SE3SH:
>       case SOC15_IH_CLIENTID_UTCL2:
> -             ret = kfd_dqm_evict_pasid(dev->dqm, pasid);
>               block = AMDGPU_RAS_BLOCK__GFX;
> -             if (ret)
> -                     reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
> +             reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
>               break;
>       case SOC15_IH_CLIENTID_VMC:
>       case SOC15_IH_CLIENTID_VMC1:
> -             ret = kfd_dqm_evict_pasid(dev->dqm, pasid);
>               block = AMDGPU_RAS_BLOCK__MMHUB;
> -             if (ret)
> -                     reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
> +             reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
>               break;
>       case SOC15_IH_CLIENTID_SDMA0:
>       case SOC15_IH_CLIENTID_SDMA1:
> --
> 2.17.1




More information about the amd-gfx mailing list