I got an IOMMU IO page fault. What to do now?
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Oct 25 11:23:36 UTC 2021
Hi Paul,
not sure how the IOMMU gives out addresses, but the printed ones look
suspicious to me. Something like we are using an invalid address like -1
or similar.
Can you try that on an up to date kernel as well? E.g. ideally bleeding
edge amd-staging-drm-next from Alex repository.
Regards,
Christian.
Am 25.10.21 um 12:25 schrieb Paul Menzel:
> Dear Linux folks,
>
>
> On a Dell OptiPlex 5055, Linux 5.10.24 logged the IOMMU messages
> below. (GPU hang in amdgpu issue #1762 [1] might be related.)
>
> $ lspci -nn -s 05:00.0
> 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices,
> Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611]
> (rev 87)
> $ dmesg
> […]
> [6318399.745242] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xfffffff0c0 flags=0x0020]
> [6318399.757283] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xfffffff7c0 flags=0x0020]
> [6318399.769154] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xffffffe0c0 flags=0x0020]
> [6318399.780913] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xfffffffec0 flags=0x0020]
> [6318399.792734] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xffffffe5c0 flags=0x0020]
> [6318399.804309] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xffffffd0c0 flags=0x0020]
> [6318399.816091] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xffffffecc0 flags=0x0020]
> [6318399.827407] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xffffffd3c0 flags=0x0020]
> [6318399.838708] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xffffffc0c0 flags=0x0020]
> [6318399.850029] amdgpu 0000:05:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x000c address=0xffffffdac0 flags=0x0020]
> [6318399.861311] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffc1c0 flags=0x0020]
> [6318399.872044] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffc8c0 flags=0x0020]
> [6318399.882797] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffb0c0 flags=0x0020]
> [6318399.893655] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffcfc0 flags=0x0020]
> [6318399.904445] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffb6c0 flags=0x0020]
> [6318399.915222] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffa0c0 flags=0x0020]
> [6318399.925931] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffbdc0 flags=0x0020]
> [6318399.936691] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffa4c0 flags=0x0020]
> [6318399.947479] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffff90c0 flags=0x0020]
> [6318399.958270] AMD-Vi: Event logged [IO_PAGE_FAULT
> device=05:00.0 domain=0x000c address=0xffffffabc0 flags=0x0020]
>
> As this is not reproducible, how would debugging go? (The system was
> rebooted in the meantime.) What options should be enabled, that next
> time the required information is logged, or what commands should I
> execute when the system is still in that state, so the bug (driver,
> userspace, …) can be pinpointed and fixed?
>
>
> Kind regards,
>
> Paul
>
>
> [1]: https://gitlab.freedesktop.org/drm/amd/-/issues/1762
> "Oland [Radeon HD 8570 / R7 240/340 OEM]: GPU hang"
More information about the amd-gfx
mailing list