[Bug 105811] kernel 4.15.x boot fails at kms

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Jan 17 02:55:09 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=105811

--- Comment #38 from Alec Habig <ahabig at umn.edu> ---
Sorry for the long silence.  Going has been slow: it takes a while to do a
tweak/test/reboot/hang cycle.  Especially since the disk controller gets munged
upon the bug happening, even persisting across a power cycle (!) about 50% of
the time (I have no idea how such a thing is even possible).  And, this device
is my "get work done" laptop, using the 4.14 kernel which is perfectly happy.

Anyway: tried a different approach.  Threw in a spare disk, and tried a clean
install of F29 and Kubuntu 18.04.1, in case cruft was to blame.

F29 installed from dvd... but the kernel installed to disk didn't boot once the
first install pass finished.
Kubuntu's install dvd wouldn't even boot past the spinny "it's starting to
boot" graphics screen.

In both cases, since things are "graphical boot" and "quiet", there's no
feedback as to exactly what went wrong, and the system hangs too hard to switch
to a different vtty, but it felt the same as the hangs described above.

Trying with the latest/last F27 kernel (yes, it's EOL now) 4.18.19-100, tried
all permutations of integrated, discrete, and optimus bios settings, excluding
nouveau drivers, and the nvidia proprietary blob.  Sometimes it can get a clean
boot - but then the sata controller goes out to lunch as soon as a write cache
flush happens.  Which makes me think the kms problem which started this thread
is a symptom rather than the problem, just the one which usually triggers
first.  This is consistent with the problems getting log traces of the problems
described above, because a lunched sata controller can't log errors.

Went back to drm-tip.  See that it's kernel 5.0 now, cool.

This minimally configured kernel continues to work.  I've enabled the extra
features needed to run the laptop, no problems.

So: the bug is at the very least triggered by one of the (myriad) of enabled
kernel options in the distro stock kernels.  It feels to me like the old days
of unprotected flat memory space where you could POKE random values into random
addresses and watch the system fall apart: with the initial kms call being the
most sensitive to it, and things unravel into thrashing the disk controller.

Parameter space is too vast for me to find the culprit with intermittent effort
and a logging system that's often the first victim of the bug.  So, I'm ready
to punt, documenting this here in case someone else with more clues googles it
is the only remaining thing I can do.

Time to just return to the 1990's and compile my own kernel :(  At least git
now makes tracking updates easier than it used to be in the Bad Old Days.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190117/331ec021/attachment.html>


More information about the intel-gfx-bugs mailing list