[Bug 100666] amdgpu coolers never stoping linux

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Feb 28 03:07:11 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=100666

--- Comment #13 from Alex Deucher <alexdeucher at gmail.com> ---
(In reply to Luke McKee from comment #11)
> In this case it was on topic. The link explains how to use fancontrol script
> from lm_sensors to work around fan control issues. I saw on another ticket
> when I first posted here that dc=1 fixed the fancontrol issues. Finally I
> got dc=1 working and still it doesn't resolve the dpm fancontrol issues on
> my platform.

dc and powerplay are largely independent.  It's generally not likely that one
will affect the other.  

> 
> https://github.com/kobalicek/amdtweak
> as root
> # ./amdtweak  --card 0 --verbose --extract-bios /tmp/amdbios.bin
> fails. The sysfs shows that the powerplay tables are not proper too.
> 

I'm not familiar with that tool or how it goes about attempting to fetch the
vbios.  The driver uses several mechanism to fetch it depending on the
platform.  It's possible that tool does something weird to fetch the vbios and
it's possible that tool incorrectly interprets some of the vbios tables.

> [ 4969.713277] resource sanity check: requesting [mem
> 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem
> 0x000c0000-0x000c3fff window]
> [ 4969.713283] caller pci_map_rom+0x66/0xf0 mapping multiple BARs
> [ 4969.713289] amdgpu 0000:01:00.0: Invalid PCI ROM header signature:
> expecting 0xaa55, got 0xffff

This last message is from the pci subsystem and is harmless.  If the driver
were not able to load the vbios, it would fail to load.

> 
> If it can't read it's powerplay table because it can't read the bios maybe
> that's why there is all these problems.

The driver is able to load the vbios image just fine.  If it wasn't able to, or
if there was a major problem with one of the tables, the driver would fail to
load.

> 
> 
>  (In reply to Alex Deucher from comment #9)
> > 
> > Please stop posting this on every bug report.
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=100666#c0
> Also the users above on this ticket above here when they grepped their dmesg
> wouldn't have output any powerplay mes.sages because they grepped radeon
> instead of amdgpu
> 
> [   10.124232] amdgpu: [powerplay] 
>                 failed to send message 309 ret is 254 
> [   10.124248] amdgpu: [powerplay] 
>                 failed to send pre message 14e ret is 254 
> 

There are lots of reasons an smu message might fail.  Just because you see an
smu message failure does not mean you are seeing the same issue as someone
else.  It's like a GPU hang.  There are lots of potential root causes.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180228/e6c0a41e/attachment.html>


More information about the dri-devel mailing list