<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [KBL] "enable_rc6" parameter deprecation brings back freezing"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105962#c53">Comment # 53</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [KBL] "enable_rc6" parameter deprecation brings back freezing"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105962">bug 105962</a>
from <span class="vcard"><a class="email" href="mailto:george.mccollister@gmail.com" title="George McCollister <george.mccollister@gmail.com>"> <span class="fn">George McCollister</span></a>
</span></b>
<pre>I'm able to reproduce this issue on five Atom E3845 based embedded systems in a
lab.
With intel_idle.max_cstate=0 processor.max_cstate=0 I can get all systems to
restart due to watchdog reset overnight.
Sometimes, but not always I have observed errors such as these on the serial
console immediately prior to the system lockup/reboot:
[144039.363431] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 00000c00 (000ffc00)
[144039.476432] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 000000c0 (000ffcc0)
[144039.589165] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 00000000 (000fccc0)
[144039.702190] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 00000000 (000f0cc0)
[144039.814669] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 00000000 (000c0cc0)
[144039.925749] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 00000000 (00000cc0)
[144040.120485] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 00000c00 (00000cc0)
[144040.233084] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 000000c0 (00000cc0)
[144040.388315] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 000000c0 (00000cc0)
[144040.607674] [drm:vlv_set_power_well [i915]] *ERROR* timeout setting power
well state 000000c0 (00000cc0)
Most commonly, only one of the above error messages (or none) is printed to the
serial console.
On 4.14.x, adding i915.enable_rc6=0 allows the systems to run 3+ days (until I
stop the test).
Hoping it would fix the problem I checked out and built kernel commit
a75d035fedbdecf83f86767aa2e4d05c8c4ffd95. All systems still rebooted overnight.
I've since found that using i915.disable_power_well=0 also prevents the problem
from occurring on all tested kernel versions. Is this setting less disruptive
to the operation than i915.enable_rc6=0? Is there also value in testing
"enable_dc=0"?
If any of the developers are working on this and think they have a fix, give me
the URI of a git repo and the commit to use and I can build, test it in the
lab. Also specify any kernel config settings and kernel command line arguments
you want me to use.
Since someone might ask I'm using "intel_idle.max_cstate=0
processor.max_cstate=0" since these systems require minimal scheduling latency.
I also noticed they can prevent other i915 lockup issues. I can remove them for
testing purposes upon request.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>