I got an IOMMU IO page fault. What to do now?

Christian König ckoenig.leichtzumerken at gmail.com
Mon Oct 25 11:23:36 UTC 2021


Hi Paul,

not sure how the IOMMU gives out addresses, but the printed ones look 
suspicious to me. Something like we are using an invalid address like -1 
or similar.

Can you try that on an up to date kernel as well? E.g. ideally bleeding 
edge amd-staging-drm-next from Alex repository.

Regards,
Christian.

Am 25.10.21 um 12:25 schrieb Paul Menzel:
> Dear Linux folks,
>
>
> On a Dell OptiPlex 5055, Linux 5.10.24 logged the IOMMU messages 
> below. (GPU hang in amdgpu issue #1762 [1] might be related.)
>
>     $ lspci -nn -s 05:00.0
>     05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, 
> Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611] 
> (rev 87)
>     $ dmesg
>     […]
>     [6318399.745242] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xfffffff0c0 flags=0x0020]
>     [6318399.757283] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xfffffff7c0 flags=0x0020]
>     [6318399.769154] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffe0c0 flags=0x0020]
>     [6318399.780913] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xfffffffec0 flags=0x0020]
>     [6318399.792734] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffe5c0 flags=0x0020]
>     [6318399.804309] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffd0c0 flags=0x0020]
>     [6318399.816091] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffecc0 flags=0x0020]
>     [6318399.827407] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffd3c0 flags=0x0020]
>     [6318399.838708] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffc0c0 flags=0x0020]
>     [6318399.850029] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffdac0 flags=0x0020]
>     [6318399.861311] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffc1c0 flags=0x0020]
>     [6318399.872044] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffc8c0 flags=0x0020]
>     [6318399.882797] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffb0c0 flags=0x0020]
>     [6318399.893655] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffcfc0 flags=0x0020]
>     [6318399.904445] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffb6c0 flags=0x0020]
>     [6318399.915222] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffa0c0 flags=0x0020]
>     [6318399.925931] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffbdc0 flags=0x0020]
>     [6318399.936691] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffa4c0 flags=0x0020]
>     [6318399.947479] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffff90c0 flags=0x0020]
>     [6318399.958270] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffabc0 flags=0x0020]
>
> As this is not reproducible, how would debugging go? (The system was 
> rebooted in the meantime.) What options should be enabled, that next 
> time the required information is logged, or what commands should I 
> execute when the system is still in that state, so the bug (driver, 
> userspace, …) can be pinpointed and fixed?
>
>
> Kind regards,
>
> Paul
>
>
> [1]: https://gitlab.freedesktop.org/drm/amd/-/issues/1762
>      "Oland [Radeon HD 8570 / R7 240/340 OEM]: GPU hang"



More information about the amd-gfx mailing list