[Bug 105278] New: Possibly nvidia/primus-induced GPU hang on rcs0, ecode 7:0:0x85fffff8

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Feb 27 20:31:21 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=105278

            Bug ID: 105278
           Summary: Possibly nvidia/primus-induced GPU hang on rcs0, ecode
                    7:0:0x85fffff8
           Product: DRI
           Version: unspecified
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/Intel
          Assignee: intel-gfx-bugs at lists.freedesktop.org
          Reporter: perso at elementw.net
        QA Contact: intel-gfx-bugs at lists.freedesktop.org
                CC: intel-gfx-bugs at lists.freedesktop.org

Created attachment 137665
  --> https://bugs.freedesktop.org/attachment.cgi?id=137665&action=edit
/sys/class/drm/card0/error contents

Bug description: My entire display froze while switching between windows in
X11. Nothing else seems to have hanged, as music was still playing and
everything came back to normal after SIGKILLing Blender which was running on
the nVidia GPU.

Details / Reproducing steps:
- Blender 2.79 was running on the nVidia GPU through primus with primusrun.
CUDA was used to render Blender Cycles images. The Blender window was inactive
for a while and did not render any other image since at least 10 minutes.
- I (accidentally) switched to the Blender window by clicking below the other
window icon I tried to click on in (a vertical) xfce4-panel, then used the
mouse wheel to get to another window above it in the list, scrolling through 4
other windows before reaching Chromium's, where the hang happened
- Xorg did not visually respond to VT switch requests in the minute or so
following the freeze, but it turned out later switching itself worked; I left
tty2 active (still without visual feedback; X11 on tty1)
- I suspended then resumed the laptop, same display before and after
- I ssh'd into my machine, where I ran:
  * `htop`, which did not show any CPU usage other than itself, sshd, firefox
and pulseaudio (which were playing music in the background)
  * `perf top` showed no graphics-related perf event samples
  * `killall -9 blender`
- At this point the display did not update but was on tty2 (expected killing
blender would unclog the graphics stack and make the console render)
- Alt+F1, and X11 resumes
- Ctrl+Alt+F2 and tty2 displays properly
- Back to X11, read dmesg and report this bug

System environment (package versions as reported by `pacman`):
-- chipset: HD4000 (part of an Intel i5-3317U; Ivy Bridge)
-- system architecture: 64-bit
-- xf86-video-intel: 1:2.99.917+812+g75795523-1
-- xserver: 1.19.6+13+gd0d1a694f-1
-- mesa: 17.3.5-1
-- libdrm: 2.4.90-3
-- kernel: 4.15.5-1-ARCH #1 SMP PREEMPT Thu Feb 22 22:15:20 UTC 2018 x86_64
-- Linux distribution: Arch Linux
-- Machine or mobo model: ASUS K56CB
-- Display connector: LVDS panel
-- nvidia: 390.25-13
-- nvidia GPU: GeForce 740M
-- primus: 20151110-7
-- bumblebee: 3.2.1-16
-- bbswitch: 0.8-113
-- blender: 17:2.79-9
-- compton (X11 compositor in use): 0.1_beta2.5-10
-- chromium: 64.0.3282.167-1

Additional info:
In the process of resetting the i915, a fence wait timed out:
[95008.506693] i915 0000:00:02.0: Resetting chip after gpu hang
[95010.549217] asynchronous wait on fence i915:[global]:6fd684 timed out
[95016.501277] i915 0000:00:02.0: Resetting chip after gpu hang

Starting up or using primus-forwarded software sometimes creates graphics
corruption on some windows, which is fixed when a redraw happens but that also
seems to happen all types of graphics buffers on the i915 like font
cache/atlases, some applications like Steam are particularly affected by this
problem. It is not unexpected that more than just buffer content gets
corrupted.
Booting with intel_iommu enabled prevents graphical output as soon as the
kernel switches away from efifb to inteldrmfb (that is, early in boot); maybe
it could have a beneficial impact on the graphics corruption problem if it
worked...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180227/c009fe66/attachment.html>


More information about the intel-gfx-bugs mailing list