[Bug 111528] Using Fan-Control causes mmhub-pagefault and unresponsive system on Navi

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Sep 1 08:08:50 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=111528

            Bug ID: 111528
           Summary: Using Fan-Control causes mmhub-pagefault and
                    unresponsive system on Navi
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: not set
         Component: DRM/AMDgpu
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: saldorin at web.de

I first thought my issue was related to
https://bugs.freedesktop.org/show_bug.cgi?id=111481 , but it seems it is a
different one.

When using any kind of fan-control software (i tried corectrl and
radeon-profile), after a while i get a strange "stutterting", as if the whole
OS halted for a few seconds, then continued for a few seconds...and the halted
times grew while the "usable seconds" got shorter quickly to the point of a
seemingly unresponsive system.
It's not just the GUI that is halted, but the whole system - i had rsync
running one time and the HDD is audible enough to hear that it was only active
during the seconds the GUI was responsive.

It doesn't happen regularly (seems like anything between 30min and 120min) and
i haven't yet made out a direct cause, but in journalctl, it seems the same
messages appear every time when it begins:

kernel: amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb,
param 0xfd6000
kernel: amdgpu: [powerplay] Failed to send message 0xf, response 0xfffffffb,
param 0xfd6000
 kernel: amdgpu 0000:0f:00.0: [mmhub] VMC page fault (src_id:0 ring:169 vmid:0
pasid:0)
 kernel: amdgpu 0000:0f:00.0:   at page 0x0000600000fd6000 from 18
 kernel: amdgpu 0000:0f:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00041152

after that there are a lot of these:

kernel: amdgpu: [powerplay] Failed to send message 0x40, response 0xffffffc2
param 0x2
kernel: amdgpu: [powerplay] Failed to send message 0xe, response 0xffffffc2,
param 0x80

with some other amdgpu-errors sprinkled in until shutdown/hardreset.

It doesn't occur without a fan-control software, so i'm pretty certain it is
somehow related to that.

System: 
Powercolor 5700xt Red Devil
3800x on X570 Taichi
Manjaro KDE
Manjaro 5.3rc6.d0826.ga55aa89-1
mesa-git 1:19.3.0_devel.114849.0142dcb990e-1
llvm-libs-git 10.0.0_r325376.70e158e09e9-1
And if it matters: firmware from
https://aur.archlinux.org/packages/linux-firmware-agd5f-radeon-navi10/
v2019.08.26.14.36-1

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190901/9641b899/attachment.html>


More information about the dri-devel mailing list