[Bug 110238] New: Crashes when using MDEV passthrough on i915
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Mon Mar 25 14:53:02 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=110238
Bug ID: 110238
Summary: Crashes when using MDEV passthrough on i915
Product: DRI
Version: XOrg git
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: DRM/Intel
Assignee: intel-gfx-bugs at lists.freedesktop.org
Reporter: christian.ehrhardt at canonical.com
QA Contact: intel-gfx-bugs at lists.freedesktop.org
CC: intel-gfx-bugs at lists.freedesktop.org
Hi,
I was using MDEV passthrough with KVMGT
Enabled on kernel commandline via /etc/default/grub:
i915.enable_gvt=1 intel_iommu=on drm.debug=0
And loading the modules:
$ printf "kvmgt\nvfio-iommu-type1\nvfio-mdev" | sudo tee
/etc/initramfs-tools/modules
Update and reboot
$ sudo update-initramfs -u
$ sudo update-grub
Then I was creating a UUID for the MDEV
$ cd /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_4
$ echo 4dd50f26-ec08-11e8-b838-4bc3356865b6 | sudo tee create
Finally I was telling libvirt to use that modifying my guest XML like
<graphics type='spice'>
<listen type='none'/>
<gl enable='yes'/>
</graphics>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
<source>
<address uuid='4dd50f26-ec08-11e8-b838-4bc3356865b6'/>
</source>
</hostdev>
The pass-through worked and the guest seemed happy for a while.
But later on I realized my guest got stuck and on the Host I found this in
dmesg:
[ 230.274856] DMAR: DRHD: handling fault status reg 3
[ 230.274923] DMAR: [DMA Write] Request device [00:02.0] fault addr fff94000
[fault reason 23] Unknown
[ 230.274985] DMAR: DRHD: handling fault status reg 2
[ 230.275021] DMAR: [DMA Write] Request device [00:02.0] fault addr 30000
[fault reason 23] Unknown
[ 230.275080] DMAR: DRHD: handling fault status reg 2
[ 230.275117] DMAR: [DMA Write] Request device [00:02.0] fault addr 55000
[fault reason 23] Unknown
[ 230.275179] DMAR: DRHD: handling fault status reg 3
[ 235.276444] dmar_fault: 5440889 callbacks suppressed
[ 235.276445] DMAR: DRHD: handling fault status reg 3
[ 235.276484] DMAR: [DMA Write] Request device [00:02.0] fault addr 2fe93c000
[fault reason 23] Unknown
[ 235.276518] DMAR: DRHD: handling fault status reg 2
[ 235.276539] DMAR: [DMA Write] Request device [00:02.0] fault addr 2fe96e000
[fault reason 23] Unknown
[ 235.276571] DMAR: DRHD: handling fault status reg 2
[ 235.276592] DMAR: [DMA Write] Request device [00:02.0] fault addr 2fe994000
[fault reason 23] Unknown
[ 235.276625] DMAR: DRHD: handling fault status reg 2
[ 240.280429] dmar_fault: 6145791 callbacks suppressed
[ 240.280431] DMAR: DRHD: handling fault status reg 3
[ 240.280463] DMAR: [DMA Write] Request device [00:02.0] fault addr 5e5db8000
[fault reason 23] Unknown
[ 240.280511] DMAR: DRHD: handling fault status reg 3
[ 240.280554] DMAR: [DMA Write] Request device [00:02.0] fault addr 5e5dec000
[fault reason 23] Unknown
[ 240.280623] DMAR: DRHD: handling fault status reg 3
[ 240.280662] DMAR: [DMA Write] Request device [00:02.0] fault addr 5e5e34000
[fault reason 23] Unknown
[ 240.280733] DMAR: DRHD: handling fault status reg 3
[ 245.284441] dmar_fault: 5699149 callbacks suppressed
[ 245.284442] DMAR: DRHD: handling fault status reg 2
[ 245.284480] DMAR: [DMA Write] Request device [00:02.0] fault addr 8c90fb000
[fault reason 23] Unknown
[ 245.284511] DMAR: DRHD: handling fault status reg 2
[ 245.284530] DMAR: [DMA Write] Request device [00:02.0] fault addr 8c9128000
[fault reason 23] Unknown
[ 245.284560] DMAR: DRHD: handling fault status reg 2
[ 245.284579] DMAR: [DMA Write] Request device [00:02.0] fault addr 8c914a000
[fault reason 23] Unknown
[ 245.284610] DMAR: DRHD: handling fault status reg 2
[ 250.106273] [drm] GPU HANG: ecode 8:0:0xe757fefe, reason: no progress on
rcs0, action: reset
[ 250.106274] [drm] GPU hangs can indicate a bug anywhere in the entire gfx
stack, including userspace.
[ 250.106275] [drm] Please file a _new_ bug report on bugs.freedesktop.org
against DRI -> DRM/Intel
[ 250.106276] [drm] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
[ 250.106276] [drm] The gpu crash dump is required to analyze gpu hangs, so
please always attach it.
[ 250.106277] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 250.106299] i915 0000:00:02.0: Resetting rcs0 for no progress on rcs0
[ 251.900704] i915 0000:00:02.0: Resetting chip for no progress on rcs0
[ 251.900718] i915 0000:00:02.0: GPU recovery failed
Unfortunately /sys/class/drm/card0/error is empty, so not a lot to report.
But OTOH it seems reproducible rather easily.
I have beignet installed on the guest and the following sequence seems to
trigger the issues:
1. starting the guest
2. run clinfo in the guest (see i915 would be available)
3. wait ~60 seconds
At some point in these 60 seconds it will crash.
I don't know yet if the "clinfo" is required or just a red herring, not much
else.
But since it seems reproducible please just ask what you'd need in addition and
I'll try to create the data needed.
HW Info:
CPU: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
$ lspci -v -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 6000 (rev 09)
(prog-if 00 [VGA controller])
Subsystem: Intel Corporation HD Graphics 6000
Flags: bus master, fast devsel, latency 0, IRQ 48
Memory at f6000000 (64-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
SW Info:
Ubuntu running latest release with kernel 5.0.0-8-generic.
For the MDEV passthrough Libvirt 5.0 and Qemu 3.1.
The Host was initially still running a Ubuntu Desktop on the very same graphic
card - so some arbitration might as well have been the issue. But I had it boot
into text mode only (no UI stack initialized) and it was triggering the same
bug.
Let me know what you'd need to get this debugged further (e.g. a pointer how to
better enable gpu crash dumps?).
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190325/30d59d07/attachment.html>
More information about the intel-gfx-bugs
mailing list