[Bug 105425] 3D & games produce periodic GPU crashes (Radeon R7 370)

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Apr 29 21:37:02 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=105425

--- Comment #57 from iive at yahoo.com ---
(In reply to MirceaKitsune from comment #56)
> I've preformed the netconsole test today. After over an hour of learning how
> it works, I set it up and could confirm that system messages are properly
> received by netcat on the other computer. Unfortunately, as expected, no
> messages get sent at the time of the freeze: Even the netconsole kernel
> module dies immediately.

When the system hangs, is SysRq still operational?
Aka, if you have netconsole working and press SysRq+h, it should show help and
send that text over the network.
If you press SysRq+r it should reboot.

I want to confirm that netconsole indeed stops working, but SysRq is still
working.

There is another method for capturing panic messages. It involves preserving
portion of the memory and loading a second kernel in there, that is started at
the event of panic. 
Actually there was even a method storing kernel panics in non-volatile memory
of the uefi bios... (That might be a bit risky).
However at this point I am not convinced that you are even getting any kernel
panic.


It is very strange that the system hangs, without the kernel panic issuing a
panic. And it is even more strange that the GPU is causing such a hang.

You see, the GPU for the most part is working on its own, so if the GPU hangs,
it should not affect the CPU operation. The radeon/amdgpu drivers could detect
GPU hang and they should complain. I've shown you how they do that for me.

This points us again in the direction of hardware. I do remember that you had
some success with `amdgpu.moverate=4` . So the issue might be around DMA and
PCIE...

For now, try `export R600_DEBUG=nodma` .

This environment variable has remained with this name, despite the fact that it
now works on much newer drivers than R600. You can see all supported options
with `R600_DEBUG=help glxgears` .

Also, you've done overclock before, maybe some options has remained. See if
your bios/uefi have something in the equivalent of "safe defaults"...

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180429/1ab126f6/attachment.html>


More information about the dri-devel mailing list