[Bug 107475] New: [iGVT-g][SKL] GPU Hang and iGVT-g guest crash under certain loads

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Aug 3 16:51:51 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=107475

            Bug ID: 107475
           Summary: [iGVT-g][SKL] GPU Hang and iGVT-g guest crash under
                    certain loads
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/Intel
          Assignee: intel-gfx-bugs at lists.freedesktop.org
          Reporter: leozinho29_eu at hotmail.com
        QA Contact: intel-gfx-bugs at lists.freedesktop.org
                CC: intel-gfx-bugs at lists.freedesktop.org

Created attachment 140955
  --> https://bugs.freedesktop.org/attachment.cgi?id=140955&action=edit
/sys/class/drm/card0/error

When using a Windows 10 guest with Intel GVT-g with dma-buf, it's noticeable
that many graphical workloads have stuttering, some applications may crash and
some consistently make the guest crash with a blue screen on the guest and
cause a GPU Hang on the host.

To reproduce the problem consistently, a Windows 10 1803 guest with dma-buf
using the Intel HD Graphics driver version 24.20.100.6194 is required. The QEMU
command line used to start the guest is:

env PULSE_LATENCY_MSEC=10 QEMU_AUDIO_ADC_VOICES=0 QEMU_AUDIO_DRV=pa \
nice -n -15 \
qemu-system-x86_64 -name "Windows 10" -k pt-br -nodefaults \
-mem-prealloc -mem-path /dev/hugepages/libvirt/qemu \
-hda redm.qcow2 \
-hdb redm-D.qcow2 \
-enable-kvm -cpu host -smp cores=2,threads=2 -m 4G \
-device usb-tablet,id=tablet -device usb-host,vendorid=0x1b3f,id=soundcardusb \
-vga none -monitor vc -serial stdio -display gtk,gl=on  \
-device
vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:02.0/123f09b0-4c00-11e8-a6ca-f3c21e47e012,rombar=0,x-igd-opregion=on,display=on,addr=0x3,id=iHD520
\
-cdrom "mídia.iso" \
-machine kernel_irqchip=on -global PIIX4_PM.disable_s3=1 -global
PIIX4_PM.disable_s4=1 -M pc,usb=true \
-netdev bridge,id=hostnet0,br=virbr0 -device
e1000,netdev=hostnet0,id=net0,mac=aa:bb:cc:dd:ee:11,addr=0x8

One application I found that consistently causes the blue screen and GPU Hang
is a game, that can be downloaded at:
https://www.vector.co.jp/download/file/win95/game/fh310532.html Even being a
very light workload, it consistently crashes the guest, particularly in the
second stage.

It is noticed there is some significant stuttering on the guest that gets worse
and worse until the guest crashes with a blue screen (not visible due to lack
of VGA modes) and the host suffers a GPU Hang with the following on dmesg:

[ 1748.473459] [drm] GPU HANG: ecode 9:0:0xfacfffff, reason: Hang on rcs0,
action: reset
[ 1748.473461] [drm] GPU hangs can indicate a bug anywhere in the entire gfx
stack, including userspace.
[ 1748.473462] [drm] Please file a _new_ bug report on bugs.freedesktop.org
against DRI -> DRM/Intel
[ 1748.473462] [drm] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
[ 1748.473463] [drm] The gpu crash dump is required to analyze gpu hangs, so
please always attach it.
[ 1748.473464] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1748.473484] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1748.998085] gvt: vgpu 1: untracked MMIO 0000207c len 4
[ 1749.003878] gvt: vgpu 1: untracked MMIO 0000207c len 4
[ 1749.011220] gvt: vgpu 1: untracked MMIO 0000207c len 4
[ 1749.019816] gvt: vgpu 1: untracked MMIO 0000207c len 4

And dozens of the untracked MMIO messages with the same address and same length
appear, then the same message with different addresses appear.

Those issues weren't observed with the 15.45 drivers (the certified ones), but
they are unusable on Windows 10 as it automatically updates the driver to a
non-functional version. 

The Windows 10 guest version is 1803 and is using Intel drivers version
24.20.100.6194. The previous version of the driver, version 24.20.100.6136 does
not have those issues, so I think 24.20.100.6194 has a regression making it
unusable on iGVT-g guests.

System specifications:

Processor: Intel Core i3-6100U;
Video: Intel HD Graphics 520;
Architecture: amd64;
Mesa: 18.2.0-devel (git-f310e86a42);
Kernel version: 4.17.11-lowlatency;
Distribution: Xubuntu 18.04.1 amd64;
QEMU version: 2.12.91 (v3.0.0-rc3-dirty).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180803/dfccb749/attachment.html>


More information about the intel-gfx-bugs mailing list