[Bug 213391] New: AMDGPU retries page fault with some specific processes amdgpu: [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Thu Jun 10 11:16:11 UTC 2021


https://bugzilla.kernel.org/show_bug.cgi?id=213391

            Bug ID: 213391
           Summary: AMDGPU retries page fault with some specific processes
                    amdgpu: [gfxhub0] retry page fault until *ERROR* ring
                    gfx timeout, but soft recovered
           Product: Drivers
           Version: 2.5
    Kernel Version: Linux 5.12.9-arch-1-1
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: low
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri at kernel-bugs.osdl.org
          Reporter: samy at lahfa.xyz
        Regression: No

Hi,

I just updated recently from mainstream Kernel 5.11.16 to 5.12.9 and I've ran
into this issue, I've also updated the Mesa driver from mesa-git
(21.1.0_devel.137307.f8e5f945b8f-1) to mesa-git
(21.2.0_devel.140633.c04f20e7e01-1).

Current kernel parameters : /vmlinuz-linux zfs=zroot/ROOT/default rw loglevel=3
quiet radeon.si_support=0 amdgpu.si_support=1 radeon.cik_support=0
amdgpu.cik_support=1

My computer is a Thinkpad T495 laptop (AMD Ryzen 7 3700 Pro with an iGPU RX
VEGA 10, 16GB DDR4 3200Mhz) the very important bit of information is that the
BIOS reserves up to 2GB of DDR4 RAM for the iGPU VRAM, I currently have setup
1GB (1024MB) of RAM in my BIOS for the iGPU, I'm thinking the page fault
retries could be linked to this in someways.

I think this has a higher chance of happening when my RAM memory is under heavy
load and the system is swapping quite a lot too. (I have 12.3GB of Swap on a
NVMe PCIe 3.0)

At present, I cannot reproduce this issue consistently yet, however it has been
happening with web browsers Qutebrowser (more with Qutebrowser) and also
happened only once with Chromium (made the X11 server crash and the computer
completely froze, kernel was still responsive to SysReq keys hence I could get
out of that tricky situation safely).

I'll be uploading both logs of the crashes I have encountered along with an
lspci and other logs files that could be useful.

Kind regards,

Lahfa Samy

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list