[Bug 91585] [BDW] System hard lock-up on resume from suspend

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Aug 18 05:20:02 PDT 2015


https://bugs.freedesktop.org/show_bug.cgi?id=91585

--- Comment #4 from Jerome <an.inbox at free.fr> ---
Hi Jesse, thanks for looking into this at a high priority.

When doing previous tests, I tried both without specific 3D app (under KDE Kwin
WM, XRender for compositing) and with both glxgears and glxdemo running, and
didn't notice a difference in lock-up frequency.

Since then I stumbled into something odd. Because stability is more important
to me than 3D performance right now, I had reverted to the default Debian
stable X Intel video driver, based on version 2.21.15. And I stayed on kernel
4.1.3 from Debian testing at the same time. With this combination 3D
acceleration is not enabled, the system uses LLVMpipe for 3D. Still, I had two
lock-ups.
After the second lock-up I tried to reproduce the bug systematically to assess
the frequency, without success so far (15 successful resumes in a row). So it
looks as if the lock-up can happen without full hardware 3D acceleration, even
if it's less frequent.
BTW, I also tried the same Intel kernel 4.2.0rc5 as before with 2.21.15: can't
reproduce either after 15 resume in a row.

The lock-up always occurs at the same point, early on. In a successful resume:
 1) at some point during the boot, the console is cleared, screen only shows a
blinking cursor in top-left corner;

 2) there's a short screen flicker, like for a mode change. Screen is black;

 3) after a short duration (~ < 1 sec), previous session graphic image is
restored;

 4) graphic environment is usable.

A lock-up occurs after (1) and before (2) as I never noticed the screen
flicker, so it's early in the resume process.
And FYI the GPU hang of bug #90342 occurred between (3) and (4) so different
timing.

The lock-up bug looks racy (not systematic, log level hide it) so bisecting on
the kernel version looks chancy: some unrelated change may hide the problem.
And the first symptoms are old (saw the lock-up with 4.0.8), with also a
different but buggy behavior on 3.16 (could be different root cause). So the
bug detection may not be always reliable, and where to start is not clear
either.

Instead, it's relatively easy to reproduce the lock-up using Intel kernel
4.2rc5 with video driver 2.99.917. Is there any way to investigate based on
this? To try to narrow down in which part of the resume sequence the issue
happens?
I have a background in embedded dev on RTOS/RISC systems, but no low-level
experience with x86 and Linux kernel dev, so please excuse some possibly naive
questions/suggestions. Is there any special debug more to get some info past
the hard lock-up? For example, even if I have to power cycle the laptop on a
lock-up, it's short and the system memory may be partially preserved. A log to
a buffer, and dump on next reboot may (TBC) show some data. Or could a watchdog
IRQ reset the video in a basic, safe text mode to dump some logs after the
lock-up?

Thanks

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20150818/25b0fa9e/attachment-0001.html>


More information about the intel-gfx-bugs mailing list