[Bug 90384] [drm] [SNB] Render ring stuck on waiting vblank

Mon May 11 01:54:02 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=90384

--- Comment #6 from Martin Mokrejs <mmokrejs at fold.natur.cuni.cz> ---
I used for various reasons so far a lot 3.10.72. I tried 3.19.0 now but could
be I did NOT use all of its features in the past, especially as the DRI/DRM
code is changing and I did not pay attention to all new features as this laptop
has only the built-in intel graphics. So, maybe I did not have enabled some
feature in the 3.10.x kernels.

I suspect an issue in 3.19.0 kernel. I did observe in the vary past with
overheated CPU that external LCD (via HDMI) blinks. Also the keyboard
under-light turns on and off on its own (the ACPI implementation is not ideal
in Linux). Also, while closing down the LCD screen of the laptop makes the
externally connected LCD (via HDMI) to blink as well (this does not happen with
MS Win7). I conclude some ACPI events are not interpreted well in Linux.

I do use a lot eSATA port (actually this is eSATA/USB2.0 combined socket) which
is hooked to the same SandyBridge chip as the HDMI socket and another USB2.0
socket. I run continually CPU-intensive tasks while accessing the eSATA drive.
That worked so far quite well in 3.10.x.

The weird thing with 3.19.0 is that while having connected external eSATA drive
and with the CPU loaded fully (2 threads, HT disabled on my phys. 2-core CPU) I
observe RESETS of the SATA connection (every few seconds so that it killed the
drive I think as its heads were shaking horribly). That SATA reset
co-incidentally results in HDMI LCD screen blink and also, the USB2.0
connection being reset (same device removed and re-enumerated, actually a mouse
connected to the socket). So, with 3,.19.0 I just cannot use my eSATA port if
the CPU is loaded (it work fine until I heat the CPU).

# lspci -tv
-[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM
Controller
           +-02.0  Intel Corporation 2nd Generation Core Processor Family
Integrated Graphics Controller
           +-16.0  Intel Corporation 6 Series/C200 Series Chipset Family MEI
Controller #1
           +-1a.0  Intel Corporation 6 Series/C200 Series Chipset Family USB
Enhanced Host Controller #2
           +-1b.0  Intel Corporation 6 Series/C200 Series Chipset Family High
Definition Audio Controller
           +-1c.0-[03-04]--
           +-1c.1-[05-06]----00.0  Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           +-1c.3-[09-0a]----00.0  Intel Corporation Centrino Wireless-N 1030
[Rainbow Peak]
           +-1c.4-[0b-0c]----00.0  Texas Instruments TUSB73x0 SuperSpeed USB
3.0 xHCI Host Controller
           +-1c.7-[11-16]----00.0  Silicon Image, Inc. SiI 3132 Serial ATA Raid
II Controller
           +-1d.0  Intel Corporation 6 Series/C200 Series Chipset Family USB
Enhanced Host Controller #1
           +-1f.0  Intel Corporation HM67 Express Chipset Family LPC Controller
           +-1f.2  Intel Corporation 6 Series/C200 Series Chipset Family 6 port
SATA AHCI Controller
           \-1f.3  Intel Corporation 6 Series/C200 Series Chipset Family SMBus
Controller
#

I barely remember observing that the eSATA resets trigger the USB2.0 link reset
with past kernels (based on my report at about 3.4 kernel there was increases
SATA reset timeout because 3.5" drive do ot spin up quickly enough and kernel
was too eager to decrease SATA speed and then even reset), but I never
investigated that and the SATA resets did not simply happen with 3.10.12 and
3.10.72. Here is the original report:
http://comments.gmane.org/gmane.linux.usb.general/61393 You can see the USB2.0
port was really being reset in the past as well, in conjunction with eSATA
resets. Thgat time it seemed it was external USB hub issue but ... it happens
even with just a mouse.

However, the 'GPU HANG' happened right after bootup, and there was one more
during the next days:

# dmesg | grep GPU
[ 9255.606859] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck
wait on render ring, action: continue
[ 9255.606923] [drm] GPU hangs can indicate a bug anywhere in the entire gfx
stack, including userspace.
[ 9255.606930] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[76352.762263] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck
wait on render ring, action: continue
#

The screen blinking itself is not causing the GPU hang. When the CPU is loaded
and eSATA drive connected, I get sometimes maybe 10 screen blinks in a minute.
But there were only two GPU HANGs as you could see per dmesg above. There were
no messages related to the second 'GPU HANG' and the card0/error file contains
same dump as of now (compared by md5sum).

BTW: This is my 4th mainboard in the laptop and second CPU. Dell tried to "fix"
my issues by replacing the mainboard all the time but I really had a bad CPU
(HDMI hardly ever giving signal). There is some bad design in the SandyBridge.
But from my past reading of the
https://blogs.intel.com/technology/2011/01/chipset_design_flaw/ it seemed to me
I have the eSATA port on a different PCIe port, so should not suffer the
'degrade over time' issue.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20150511/838cf664/attachment.html>