[PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling
Zhou1, Tao
Tao.Zhou1 at amd.com
Tue Feb 20 06:23:06 UTC 2024
[AMD Official Use Only - General]
> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> Sent: Monday, February 19, 2024 8:40 PM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling
>
>
>
> On 2/19/2024 1:45 PM, Tao Zhou wrote:
> > Let kfd interrupt handler process it.
> >
> > Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > index 773725a92cf1..70defc394b7b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > @@ -552,7 +552,7 @@ static int gmc_v9_0_process_interrupt(struct
> > amdgpu_device *adev, {
> > bool retry_fault = !!(entry->src_data[1] & 0x80);
> > bool write_fault = !!(entry->src_data[1] & 0x20);
> > - uint32_t status = 0, cid = 0, rw = 0;
> > + uint32_t status = 0, cid = 0, rw = 0, fed = 0;
> > struct amdgpu_task_info task_info;
> > struct amdgpu_vmhub *hub;
> > const char *mmhub_cid;
> > @@ -663,6 +663,14 @@ static int gmc_v9_0_process_interrupt(struct
> amdgpu_device *adev,
> > status = RREG32(hub->vm_l2_pro_fault_status);
> > cid = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, CID);
> > rw = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, RW);
> > + fed = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS,
> FED);
> > +
> > + /* for gfx fed error, kfd will handle it, return directly */
> > + if (fed && amdgpu_ras_is_poison_mode_supported(adev) &&
> > + amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2) &&
> > + !strcmp(hub_name, "gfxhub0"))
> > + return 1;
>
> amdgpu_irq_dispatch() gives the impression that return value of 1 is treated as
> handled, hence won't be passed to kfd. The commit description says it is intended
> to pass to kfd for handling.
[Tao] good catch, it should return 0 here, will update it in v2, thanks.
>
> Also, FED status check may be moved up so that it's not misunderstood as a
> regular page fault with the extra prints coming to dmesg log.
> Otherwise, poison status also needs to be added to dmesg.
[Tao] there is poison consumption dmesg log in kfd interrupt handler, no neeed to add extra print here.
My intention is to skip " WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1)", moving up the check will make the change a little bit more and I think the page fault log is acceptable.
>
> Thanks,
> Lijo
>
> > +
> > WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1); #ifdef
> > HAVE_STRUCT_XARRAY
> > amdgpu_vm_update_fault_cache(adev, entry->pasid, addr, status,
> vmhub);
More information about the amd-gfx
mailing list