[PATCH v2 1/3] drm/amdkfd: Check int source id for utcl2 poison event

Zhou1, Tao Tao.Zhou1 at amd.com
Tue Aug 20 06:11:29 UTC 2024


[AMD Official Use Only - AMD Internal Distribution Only]

The series is:

Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>

> -----Original Message-----
> From: Hawking Zhang <Hawking.Zhang at amd.com>
> Sent: Tuesday, August 20, 2024 2:05 PM
> To: amd-gfx at lists.freedesktop.org; Zhou1, Tao <Tao.Zhou1 at amd.com>; Yang,
> Stanley <Stanley.Yang at amd.com>
> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Fan, Shikang
> <Shikang.Fan at amd.com>
> Subject: [PATCH v2 1/3] drm/amdkfd: Check int source id for utcl2 poison event
>
> Traditional utcl2 fault_status polling does not work in SRIOV environment. The
> polling of fault status register from guest side will be dropped by hardware.
>
> Driver should switch to check utcl2 interrupt source id to identify utcl2 poison
> event. It is set to 1 when poisoned data interrupts are signaled.
>
> v2: drop the unused local variable (Tao)
>
> Signed-off-by: Hawking Zhang <Hawking.Zhang at amd.com>
> ---
>  .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c    | 18 +-----------------
>  drivers/gpu/drm/amd/amdkfd/soc15_int.h         |  1 +
>  2 files changed, 2 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index a9c3580be8c9..fecdbbab9894 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -431,25 +431,9 @@ static void event_interrupt_wq_v9(struct kfd_node
> *dev,
>                  client_id == SOC15_IH_CLIENTID_UTCL2) {
>               struct kfd_vm_fault_info info = {0};
>               uint16_t ring_id =
> SOC15_RING_ID_FROM_IH_ENTRY(ih_ring_entry);
> -             uint32_t node_id =
> SOC15_NODEID_FROM_IH_ENTRY(ih_ring_entry);
> -             uint32_t vmid_type =
> SOC15_VMID_TYPE_FROM_IH_ENTRY(ih_ring_entry);
> -             int hub_inst = 0;
>               struct kfd_hsa_memory_exception_data exception_data;
>
> -             /* gfxhub */
> -             if (!vmid_type && dev->adev->gfx.funcs-
> >ih_node_to_logical_xcc) {
> -                     hub_inst = dev->adev->gfx.funcs-
> >ih_node_to_logical_xcc(dev->adev,
> -                             node_id);
> -                     if (hub_inst < 0)
> -                             hub_inst = 0;
> -             }
> -
> -             /* mmhub */
> -             if (vmid_type && client_id == SOC15_IH_CLIENTID_VMC)
> -                     hub_inst = node_id / 4;
> -
> -             if (amdgpu_amdkfd_ras_query_utcl2_poison_status(dev->adev,
> -                                     hub_inst, vmid_type)) {
> +             if (source_id == SOC15_INTSRC_VMC_UTCL2_POISON) {
>                       event_interrupt_poison_consumption_v9(dev, pasid,
> client_id);
>                       return;
>               }
> diff --git a/drivers/gpu/drm/amd/amdkfd/soc15_int.h
> b/drivers/gpu/drm/amd/amdkfd/soc15_int.h
> index 10138676f27f..e5c0205f2618 100644
> --- a/drivers/gpu/drm/amd/amdkfd/soc15_int.h
> +++ b/drivers/gpu/drm/amd/amdkfd/soc15_int.h
> @@ -29,6 +29,7 @@
>  #define SOC15_INTSRC_CP_BAD_OPCODE   183
>  #define SOC15_INTSRC_SQ_INTERRUPT_MSG        239
>  #define SOC15_INTSRC_VMC_FAULT               0
> +#define SOC15_INTSRC_VMC_UTCL2_POISON        1
>  #define SOC15_INTSRC_SDMA_TRAP               224
>  #define SOC15_INTSRC_SDMA_ECC                220
>  #define SOC21_INTSRC_SDMA_TRAP               49
> --
> 2.17.1



More information about the amd-gfx mailing list