<html> <head> <base href="https://bugs.freedesktop.org/"> </head> <body><table border="1" cellspacing="0" cellpadding="8"> <tr> <th>Bug ID</th> <td><a class="bz_bug_link bz_status_NEW " title="NEW - Crashes when using MDEV passthrough on i915" href="https://bugs.freedesktop.org/show_bug.cgi?id=110238">110238</a> </td> </tr> <tr> <th>Summary</th> <td>Crashes when using MDEV passthrough on i915 </td> </tr> <tr> <th>Product</th> <td>DRI </td> </tr> <tr> <th>Version</th> <td>XOrg git </td> </tr> <tr> <th>Hardware</th> <td>Other </td> </tr> <tr> <th>OS</th> <td>All </td> </tr> <tr> <th>Status</th> <td>NEW </td> </tr> <tr> <th>Severity</th> <td>normal </td> </tr> <tr> <th>Priority</th> <td>medium </td> </tr> <tr> <th>Component</th> <td>DRM/Intel </td> </tr> <tr> <th>Assignee</th> <td>intel-gfx-bugs@lists.freedesktop.org </td> </tr> <tr> <th>Reporter</th> <td>christian.ehrhardt@canonical.com </td> </tr> <tr> <th>QA Contact</th> <td>intel-gfx-bugs@lists.freedesktop.org </td> </tr> <tr> <th>CC</th> <td>intel-gfx-bugs@lists.freedesktop.org </td> </tr></table> <p> <div> <pre>Hi, I was using MDEV passthrough with KVMGT Enabled on kernel commandline via /etc/default/grub: i915.enable_gvt=1 intel_iommu=on drm.debug=0 And loading the modules: $ printf "kvmgt\nvfio-iommu-type1\nvfio-mdev" | sudo tee /etc/initramfs-tools/modules Update and reboot $ sudo update-initramfs -u $ sudo update-grub Then I was creating a UUID for the MDEV $ cd /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V4_4 $ echo 4dd50f26-ec08-11e8-b838-4bc3356865b6 | sudo tee create Finally I was telling libvirt to use that modifying my guest XML like <graphics type='spice'> <listen type='none'/> <gl enable='yes'/> </graphics> <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'> <source> <address uuid='4dd50f26-ec08-11e8-b838-4bc3356865b6'/> </source> </hostdev> The pass-through worked and the guest seemed happy for a while. But later on I realized my guest got stuck and on the Host I found this in dmesg: [ 230.274856] DMAR: DRHD: handling fault status reg 3 [ 230.274923] DMAR: [DMA Write] Request device [00:02.0] fault addr fff94000 [fault reason 23] Unknown [ 230.274985] DMAR: DRHD: handling fault status reg 2 [ 230.275021] DMAR: [DMA Write] Request device [00:02.0] fault addr 30000 [fault reason 23] Unknown [ 230.275080] DMAR: DRHD: handling fault status reg 2 [ 230.275117] DMAR: [DMA Write] Request device [00:02.0] fault addr 55000 [fault reason 23] Unknown [ 230.275179] DMAR: DRHD: handling fault status reg 3 [ 235.276444] dmar_fault: 5440889 callbacks suppressed [ 235.276445] DMAR: DRHD: handling fault status reg 3 [ 235.276484] DMAR: [DMA Write] Request device [00:02.0] fault addr 2fe93c000 [fault reason 23] Unknown [ 235.276518] DMAR: DRHD: handling fault status reg 2 [ 235.276539] DMAR: [DMA Write] Request device [00:02.0] fault addr 2fe96e000 [fault reason 23] Unknown [ 235.276571] DMAR: DRHD: handling fault status reg 2 [ 235.276592] DMAR: [DMA Write] Request device [00:02.0] fault addr 2fe994000 [fault reason 23] Unknown [ 235.276625] DMAR: DRHD: handling fault status reg 2 [ 240.280429] dmar_fault: 6145791 callbacks suppressed [ 240.280431] DMAR: DRHD: handling fault status reg 3 [ 240.280463] DMAR: [DMA Write] Request device [00:02.0] fault addr 5e5db8000 [fault reason 23] Unknown [ 240.280511] DMAR: DRHD: handling fault status reg 3 [ 240.280554] DMAR: [DMA Write] Request device [00:02.0] fault addr 5e5dec000 [fault reason 23] Unknown [ 240.280623] DMAR: DRHD: handling fault status reg 3 [ 240.280662] DMAR: [DMA Write] Request device [00:02.0] fault addr 5e5e34000 [fault reason 23] Unknown [ 240.280733] DMAR: DRHD: handling fault status reg 3 [ 245.284441] dmar_fault: 5699149 callbacks suppressed [ 245.284442] DMAR: DRHD: handling fault status reg 2 [ 245.284480] DMAR: [DMA Write] Request device [00:02.0] fault addr 8c90fb000 [fault reason 23] Unknown [ 245.284511] DMAR: DRHD: handling fault status reg 2 [ 245.284530] DMAR: [DMA Write] Request device [00:02.0] fault addr 8c9128000 [fault reason 23] Unknown [ 245.284560] DMAR: DRHD: handling fault status reg 2 [ 245.284579] DMAR: [DMA Write] Request device [00:02.0] fault addr 8c914a000 [fault reason 23] Unknown [ 245.284610] DMAR: DRHD: handling fault status reg 2 [ 250.106273] [drm] GPU HANG: ecode 8:0:0xe757fefe, reason: no progress on rcs0, action: reset [ 250.106274] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 250.106275] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 250.106276] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 250.106276] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 250.106277] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 250.106299] i915 0000:00:02.0: Resetting rcs0 for no progress on rcs0 [ 251.900704] i915 0000:00:02.0: Resetting chip for no progress on rcs0 [ 251.900718] i915 0000:00:02.0: GPU recovery failed Unfortunately /sys/class/drm/card0/error is empty, so not a lot to report. But OTOH it seems reproducible rather easily. I have beignet installed on the guest and the following sequence seems to trigger the issues: 1. starting the guest 2. run clinfo in the guest (see i915 would be available) 3. wait ~60 seconds At some point in these 60 seconds it will crash. I don't know yet if the "clinfo" is required or just a red herring, not much else. But since it seems reproducible please just ask what you'd need in addition and I'll try to create the data needed. HW Info: CPU: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz $ lspci -v -s 00:02.0 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 6000 (rev 09) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation HD Graphics 6000 Flags: bus master, fast devsel, latency 0, IRQ 48 Memory at f6000000 (64-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] I/O ports at f000 [size=64] [virtual] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: i915 Kernel modules: i915 SW Info: Ubuntu running latest release with kernel 5.0.0-8-generic. For the MDEV passthrough Libvirt 5.0 and Qemu 3.1. The Host was initially still running a Ubuntu Desktop on the very same graphic card - so some arbitration might as well have been the issue. But I had it boot into text mode only (no UI stack initialized) and it was triggering the same bug. Let me know what you'd need to get this debugged further (e.g. a pointer how to better enable gpu crash dumps?).</pre> </div> </p> <hr> <span>You are receiving this mail because:</span> <ul> <li>You are the QA Contact for the bug.</li> <li>You are on the CC list for the bug.</li> <li>You are the assignee for the bug.</li> </ul> </body> </html>