[Bug 110674] Crashes / Resets From AMDGPU / Radeon VII

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Jun 15 16:58:59 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #37 from Tom B <tom at r.je> ---
5.1.9 makes this bug even worse. It now crashes as soon as the display server
is started.

Running sensors now gives an error:


ERROR: Can't get value of subfeature fan1_input: I/O error
ERROR: Can't get value of subfeature power1_average: I/O error
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +37.0°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +34.8°C  (high = +70.0°C)
Tctl:         +61.8°C  

amdgpu-pci-4400
Adapter: PCI adapter
vddgfx:       +0.74 V  
fan1:             N/A  (min =    0 RPM, max = 3850 RPM)
temp1:        +39.0°C  (crit = +118.0°C, hyst = -273.1°C)
power1:           N/A  (cap = 250.00 W)

k10temp-pci-00cb
Adapter: PCI adapter
Tdie:         +33.2°C  (high = +70.0°C)
Tctl:         +60.2°C  



I can't even see the wattage now. 

# cat /sys/kernel/debug/dri/0/amdgpu_pm_info

Clock Gating Flags Mask: 0x860200
        Graphics Medium Grain Clock Gating: Off
        Graphics Medium Grain memory Light Sleep: Off
        Graphics Coarse Grain Clock Gating: Off
        Graphics Coarse Grain memory Light Sleep: Off
        Graphics Coarse Grain Tree Shader Clock Gating: Off
        Graphics Coarse Grain Tree Shader Light Sleep: Off
        Graphics Command Processor Light Sleep: Off
        Graphics Run List Controller Light Sleep: Off
        Graphics 3D Coarse Grain Clock Gating: Off
        Graphics 3D Coarse Grain memory Light Sleep: Off
        Memory Controller Light Sleep: Off
        Memory Controller Medium Grain Clock Gating: On
        System Direct Memory Access Light Sleep: Off
        System Direct Memory Access Medium Grain Clock Gating: Off
        Bus Interface Medium Grain Clock Gating: Off
        Bus Interface Light Sleep: Off
        Unified Video Decoder Medium Grain Clock Gating: Off
        Video Compression Engine Medium Grain Clock Gating: Off
        Host Data Path Light Sleep: Off
        Host Data Path Medium Grain Clock Gating: Off
        Digital Right Management Medium Grain Clock Gating: Off
        Digital Right Management Light Sleep: On
        Rom Medium Grain Clock Gating: On
        Data Fabric Medium Grain Clock Gating: On

GFX Clocks and Power:
        1373 MHz (PSTATE_SCLK)
        1001 MHz (PSTATE_MCLK)
        737 mV (VDDGFX)

GPU Temperature: 39 C

UVD: Disabled

VCE: Disabled


No clocks or wattage! 

I'm guessing 34d07ce3d6a120056e4763ae9a3db0d769ab7c63 "fix ring test failure
issue during s3 in vce 3.0 (V2)" is to blame as dmesg (attached in next post)
says


[   20.584937] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=25, emitted seq=27

It would be nice to see some acknowledgement from AMD on this.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190615/33eb2223/attachment-0001.html>


More information about the dri-devel mailing list