[Mesa-dev] [Bug 111080] Random crash on amdgpu due to temperature missrepoorting
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Sun Jul 7 15:53:57 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=111080
Bug ID: 111080
Summary: Random crash on amdgpu due to temperature
missrepoorting
Product: Mesa
Version: unspecified
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: major
Priority: medium
Component: Mesa core
Assignee: mesa-dev at lists.freedesktop.org
Reporter: timitch_1 at yahoo.com
QA Contact: mesa-dev at lists.freedesktop.org
Created attachment 144716
--> https://bugs.freedesktop.org/attachment.cgi?id=144716&action=edit
amdgpu_pm_info information from start of game to crash
Hi,
I have been experiencing some random crash in dota 2 for the past 2 years.
Changed everything in the computer 6900k -> threadripper, corsaire memory ->
gskill, radeon frontier -> radeon vega 7. Ubuntu 16.04 ->16.10 -> 17.04 ->
17.10 ->18.04 ->18.10 ->19.04. This is with all the mesa version in between
currently on
"OpenGL renderer string: AMD Radeon VII (VEGA20, DRM 3.32.0, 5.2.0-rc7+, LLVM
9.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.2.0-devel -
padoka PPA
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
"
All experience the same random crash.
I finally got on lead on the problem seeing the GPU reporting unrealistic
values, ex: MHZ jump to 10 000 range. Around the time of the crash temperature
in the logs goes from 62c to 500c within two seconds back to 62c. This I
suspect would cause the GPU to apply its protection and freeze and if it was
true, also violate some law of physics.
Most other tool I use to test the grapgic card, example Uningine, reports
correct values within the supported range defined for the cards which are
"
#OD_VDDC_CURVE:
#0: 808Mhz 704mV
#1: 1304Mhz 777mV
#2: 1801Mhz 1054mV
#OD_RANGE:
#SCLK: 808Mhz 2200Mhz
#MCLK: 351Mhz 1200Mhz
"
Attached is an example generated with
"watch -t -n1 'cat /sys/kernel/debug/dri/1/amdgpu_pm_info|grep -A 9 "GFX
Clocks" | tee -a /home/mitch/tmp/gpulog.txt'"
Example grep Temp
"
GPU Temperature: 70 C
GPU Temperature: 511 C
GPU Temperature: 69 C
"
grep \(SLCK
"
1924 MHz (SCLK)
5422 MHz (SCLK)
1999 MHz (SCLK)
"
I realize the issue might be somewhere else than the mesa driver but would like
to know where this could be and if anybody else seen this kind of behaviour
Thank you very much for any help
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190707/143de5ec/attachment.html>
More information about the mesa-dev
mailing list