[Bug 111763] ring_gfx hangs/freezes on Navi gpus
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Tue Nov 5 06:07:33 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=111763
--- Comment #23 from wychuchol <wychuchol7777 at gmail.com> ---
(In reply to wychuchol from comment #19)
> After some time in Witcher 3 GOTY run with Lutris PC restarts on it's own. I
> thought something is overheating (I've noticed graphic card memory in
> PSensor sometimes reaching 90 so I thought maybe that's what's happening)
> but I investigated kern.log and this always happened before that autonomous
> reset:
>
> Nov 2 22:01:53 pop-os kernel: [ 979.244964] pcieport 0000:00:01.1: AER:
> Corrected error received: 0000:01:00.0
> Nov 2 22:01:53 pop-os kernel: [ 979.244967] nvme 0000:01:00.0: AER: PCIe
> Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
> Nov 2 22:01:53 pop-os kernel: [ 979.244968] nvme 0000:01:00.0: AER:
> device [1987:5012] error status/mask=00001000/00006000
> Nov 2 22:01:53 pop-os kernel: [ 979.244968] nvme 0000:01:00.0: AER:
> [12] Timeout
> Nov 2 22:01:53 pop-os kernel: [ 979.262629] Emergency Sync complete
Thing with those AER errors is that they can go on and on and reset happens few
minutes after the last logged error.
This might be overheating, I managed to find how to output sensors readings
into txt log and found that memory went up to 96 C (or rather it stayed there
for about 1m 10s)
Last reading before reset:
amdgpu-pci-2800
Adapter: PCI adapter
vddgfx: +1.16 V
fan1: 1551 RPM (min = 0 RPM, max = 3200 RPM)
edge: +74.0°C (crit = +118.0°C, hyst = -273.1°C)
(emerg = +99.0°C)
junction: +88.0°C (crit = +99.0°C, hyst = -273.1°C)
(emerg = +99.0°C)
mem: +96.0°C (crit = +99.0°C, hyst = -273.1°C)
(emerg = +99.0°C)
power1: 162.00 W (cap = 195.00 W)
k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +70.5°C (high = +70.0°C)
Tctl: +70.5°C
Now the weird thing is - if this is in fact overheating why fan didn't go
beyond 1600 rpm even once.... Highest was like 1581 rpm and I don't have silent
bios switched on (sapphire pulse rx 5700 xt, lever facing away from video
ports).
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191105/002fdd60/attachment-0001.html>
More information about the dri-devel
mailing list