amd gpu in Xen domu -- NULL dereference in drivers/gpu/drm/drm_pci.c

Håkon Alstadheim hakon at alstadheim.priv.no
Fri May 18 11:18:03 UTC 2018


TL;DR: How to get some canonical work-around for quirks into official
module and/or kernel repo ? Running with gpu passed through to virtual
machine.

-----

I'm running this card:

00:06.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Curacao PRO [Radeon R7 370 / R9 270/370 OEM] (prog-if 00 [VGA
controller])

passed through to a user domain under Xen. (xen-4.11 release candidate ).

I have been running with this patch for a long while now:

-----------------------

--- a/drivers/gpu/drm/drm_pci.c    2017-07-30 23:29:43.550000000 +0200
+++ b/drivers/gpu/drm/drm_pci.c    2017-07-30 23:28:55.580000000 +0200
@@ -337,11 +337,28 @@
     u32 lnkcap, lnkcap2;
 
     *mask = 0;
-    if (!dev->pdev)
-        return -EINVAL;
-
+    if (!dev->pdev) {
+      DRM_INFO("invalid dev->pdev\n");
+      return -EINVAL;
+    }
+   
+    if (!dev->pdev->bus) {
+      DRM_INFO("invalid dev->pdev->bus\n");
+      return -EINVAL;
+    }
+   
+    if (!dev->pdev->bus->self) {
+      DRM_INFO("invalid dev->pdev->bus->self\n");
+      return -EINVAL;
+    }
+   
     root = dev->pdev->bus->self;
 
+    if (!root->vendor) {
+      DRM_INFO("invalid root->vendor\n");
+      return -EINVAL;
+    }
+
     /* we've been informed via and serverworks don't make the cut */
     if (root->vendor == PCI_VENDOR_ID_VIA ||
         root->vendor == PCI_VENDOR_ID_SERVERWORKS)
------------------------------------------------------

This gets me the following logged on boot:

mai 17 23:27:35 gt kernel: [drm] invalid dev->pdev->bus->self

I created the patch on a hunch without any knowledge of how that
return-value gets interpreted. At the moment I'm using the radeon driver
in kernel  4.16.9-gentoo, but I'm thinking about switching to amdgpu.
Before I do that I'd hope to have the deficiencies running under Xen
could get some kind of "official" work-around.

Works OK, but I have some random stalls in window-manager and browsers,
without anything showing up in top or atop. One of my suspected culprits
is some kind of error time-out in graphics rendering. I'm not a
programmer, so wading through threads in gdb is not my forte.




More information about the amd-gfx mailing list