[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Mar 10 09:37:56 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #75 from Allan <allan4229 at gmail.com> ---
Well, after a long time I'm here again to tell what happened:

A very nice AMD staff was following me up because of the CPU, and it ended up
solving the problems I had with the video card (seems like).


1. Regarding the kernel timing
(In reply to fin4478 from comment #52)
> To prevent random kernel lock ups with Ryzen, fix this with bios, set to
> Typical Current Idle  in the bios Advanced/AMD CBS menu.
> 
> Use latest AMD wip kernel and Oibaf ppa Mesa. Disable display composting and
> vsync with Xfce. Use 300Hz kernel timer.
> 
> Working kernel config file for my system as attachment.

Yes, I tried it a lot, believe me, all combinations possible, 300hz, 250hz,
1000hz, your config, linux-firmware drivers. At least 10 attempts with
variations of your config, including a pure one only activating dmcrypt that is
not enabled in yours.

2. Regarding the PSU profile
As already said by fin4478 and requested by AMD, I requested to BIOSTAR a bios
that allowed to change it. They sent me a beta version to test it.

No luck at all, nothing related.

3. The madness
Nothing worked, but the CPU was already ok. The mobo was already ok, the video
card was hunging sometimes, even while on Windows now.

Ok, I made a shot in the dark suspecting of some nonsense incompatibility of
the ram.

And this is it. Even after sending it to the warranty, even after making 100+
tests, the ram was the issue.

Was a Corsair Vengeance one : 2x4GB DDR4 CL15, 2133MHz SPD (JEDEC), 3000MHz
XMP2.

Even at JEDEC specifications it caused the system to fail.

Even if I delayed the latencies by much it was causing it.

It was what was causing the amdgpu driver to fail. Along with any heavy
application. Since the RAM is used before sending things to VRAM, makes sense
to the driver/device to process something unexpected.

I warn everyone that uses Corsair memories, specially if they don't have their
"Ryzen ready" merchandise. Even though there's a standard called JEDEC, they
simply don't implement it very well.

It was the reason why sometimes I could use the system for 1-2 hours, and
sometimes not even 5 minutes before crashing. There is some kind of instability
there.

I sold it to a guy that uses an 8700k or something, exaplined the situation, he
agreed. Until now (more than 2 months) there is not a single issue related to
the memory chips. They must have  done somthing to optimize for Intel beyond
the XMP profile and compromised the entire project. Along with 1 year of my
life and a bunch of money spent.

But, the fixes along time in amdgpu indeed was proven to be useful, so it was
not only a ram's fault. Because using the same ram chips, I had a lot less
problems compared to when I reported this problem.

Now I'm using a G.Skill Tridentz 3200MHz @ 2666MHz that is the speed assured by
AMD that the 1800X must work with. Stable without a single problem related to
it.

4. To confirm that I have won the raffle of a not working system my RX480 died
a month ago probably because of a BGA problem.

Then I found a label in the card, looked for it, and discovered that a selled
sold me a refurbished product as new.

Then I'm evaluating if I'll sue him or just fix the card.

And I told about it because this is why I can't test it again until I get
another amd card. I'm using the nvidia that I couldn't sell in the meantime.

5. The funny part.

The nvidia driver that seemed to be a lot stable at first, started to fail like
hell after replacing the truly problematic CPU.

And the amdgpu driver started to be more stable, more than any other driver
from linux or windows.


Well, I think that this is it. I'll return when I'm able to test amdgpu again.

But the veredict for now is :

I tested the RX480 without a single problem while using amdgpu. Not used
intensively, just common tests and played a little bit of Left for Dead 2
without any issue (good point, it always crashed).

The card showed the BGA problem when using a variation of the Adrenalin driver
for windows, when I was doing some verifications requested by AMD.

Cheers for all.
Prefer G. Skill instead of Corsair.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190310/960bed88/attachment.html>


More information about the dri-devel mailing list