[Bug 108781] 4.19 Regression - Hawaii (R9 390) boot failure - Invalid PCC GPIO / invalid powerlevel state / Fatal error during GPU init

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Nov 23 22:42:36 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=108781

--- Comment #23 from jamespharvey20 at gmail.com ---
Don't know if comment 22 re patch 259364 was directed toward me or not.  If me,
see comment 6 & 20, where I tried it.  It's also applied to the journalctl I'm
about to upload, and the text of this comment.


After so many people here and elsewhere saying removing "amdgpu.dpm=1" fixes
the problem for them, I retried it.  I still get a black screen.  But, on
further analysis, removing it might untangle the spaghetti.

Leaving off "amdgpu.dpm=1" shows the bug in commit 0d998891 is still in git
master(7c98a42.)  Attached is a journalctl from git master(7c98a42) - with the
bug 108704 patch applied.  (I'm not sure my setup needs that, just applying it
to reduce potential bugs.)

* On 4.18.16, using dpm=1, as required to workaround bug 91880, works.          
* On 4.19, using dpm=1, I get black screens with errors as shown in my original
post.                             
* On 4.19, leaving dpm=1 out, I still get a black screen, but the errors given
match the errors from commit 0d998891 as shown on comment 13, toward the bottom
under "Deeper on issue 1, regarding this bad commit".  They're the
"amdgpu_drm_irq_schedule_work FAILED src / Cannot find any crtc or sizes"
errors.  And, previously attached as "journalctl of 0d9988910989, which gets to
a black screen".
* Commit 0d998891 is what introduces the "irq_schedule_work / crtc or sizes"
error.  Its parent c91b007e doesn't have them and works fine.

So, 4.19 breaks "amdgpu.dpm=1" as others have shown, and removing the parameter
bypasses that bug.  The commit that breaks dpm must be somewhere after
0d998891, because otherwise that commit shouldn't be showing me the errors that
are hidden on 4.19 and git master.

And, 4.19 also breaks without dpm for myself and whoever else that 0d998891
breaks.

With 4.19, the code causing the bug from dpm executes first in the auto-kms
stage, and if that kernel parameter isn't given, the code causing the bug from
0d998891 executes after that.

Because the dpm bug is in a more recent commit than 0d998891, it appeared to me
that the bug in 0d998891 bug was fixed in some more recent commit.  But, that
was only because running 4.19 or git master runs into the dpm bug that hides
the 0d998891 bug, because execution never gets that far.

Since some others are not running into the 0d998891 bug, notably Shecks' MSI R9
390, I'm thinking my specific Asus STRIX-R9390-DC3OC-8GD5-GAMING and some
others, implement something differently or wrongly.  I don't even know enough
about what video card manufacturers add on top of the chipset to know if that's
possible, but it's the only thing that makes sense to me.  Unless it's an
interaction between the R9 390 and running a server Xeon board, running 5
DisplayPort monitors through 3 DisplayPorts having 2 chained, or something else
specific to me and some others but not Shecks.

I don't plan on bisecting to determine which commit breaks dpm.  I'll only be
doing it if 0d998891 is fixed, and leaving the kernel parameter off brings back
the stability problems in bug 91880 -- and if the vega 64 card I'm getting
Wednesday doesn't run perfectly -- and if no one else will does it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20181123/5b5defd2/attachment-0001.html>


More information about the dri-devel mailing list