[PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling

Zhou1, Tao Tao.Zhou1 at amd.com
Tue Feb 20 06:23:06 UTC 2024


[AMD Official Use Only - General]

> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> Sent: Monday, February 19, 2024 8:40 PM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling
>
>
>
> On 2/19/2024 1:45 PM, Tao Zhou wrote:
> > Let kfd interrupt handler process it.
> >
> > Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > index 773725a92cf1..70defc394b7b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > @@ -552,7 +552,7 @@ static int gmc_v9_0_process_interrupt(struct
> > amdgpu_device *adev,  {
> >     bool retry_fault = !!(entry->src_data[1] & 0x80);
> >     bool write_fault = !!(entry->src_data[1] & 0x20);
> > -   uint32_t status = 0, cid = 0, rw = 0;
> > +   uint32_t status = 0, cid = 0, rw = 0, fed = 0;
> >     struct amdgpu_task_info task_info;
> >     struct amdgpu_vmhub *hub;
> >     const char *mmhub_cid;
> > @@ -663,6 +663,14 @@ static int gmc_v9_0_process_interrupt(struct
> amdgpu_device *adev,
> >     status = RREG32(hub->vm_l2_pro_fault_status);
> >     cid = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, CID);
> >     rw = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, RW);
> > +   fed = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS,
> FED);
> > +
> > +   /* for gfx fed error, kfd will handle it, return directly */
> > +   if (fed && amdgpu_ras_is_poison_mode_supported(adev) &&
> > +       amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2) &&
> > +       !strcmp(hub_name, "gfxhub0"))
> > +           return 1;
>
> amdgpu_irq_dispatch() gives the impression that return value of 1 is treated as
> handled, hence won't be passed to kfd. The commit description says it is intended
> to pass to kfd for handling.

[Tao] good catch, it should return 0 here, will update it in v2, thanks.

>
> Also, FED status check may be moved up so that it's not misunderstood as a
> regular page fault with the extra prints coming to dmesg log.
> Otherwise, poison status also needs to be added to dmesg.

[Tao] there is poison consumption dmesg log in kfd interrupt handler, no neeed to add extra print here.
My intention is to skip " WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1)", moving up the check will make the change a little bit more and I think the page fault log is acceptable.

>
> Thanks,
> Lijo
>
> > +
> >     WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);  #ifdef
> > HAVE_STRUCT_XARRAY
> >     amdgpu_vm_update_fault_cache(adev, entry->pasid, addr, status,
> vmhub);


More information about the amd-gfx mailing list