Lockdep annotation introduced warn in VMD driver

Dan Williams dan.j.williams at intel.com
Wed May 29 19:34:19 UTC 2024


Imre Deak wrote:
[..]
> > > Also Imre tried with 2 PCI patches together https://patchwork.freedesktop.org/series/134193/ 
> > > And still not good for those 4 systems (mtlp-9, bat-dg2-13/14 and bat-adlp-11) :
> > > https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_134193v1/index.html? 
> > > Dave, Dan, thoughts? 
> > 
> > Can you provide the dmesg from the failure system with the 2 patches applied please?
> 
> For the above 4 machines, mtlp-9 not having the originally reported WARN
> (at pci.c:4886) only some other lockdep issue, while the other 3
> machines having both the originally reported one and the other lockdep
> issue:

> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_134193v1/bat-mtlp-9/boot0.txt

This one does not seem to implicate cfg_access_lock at all. I wonder if
you revert the lockdep annotation completely if it still fails. I.e.
this is a new lockdep report for v6.10-rc independent of this new
cfg_access_lock annotation.

> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_134193v1/bat-dg2-13/boot0.txt
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_134193v1/bat-dg2-14/boot0.txt
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_134193v1/bat-adlp-11/boot0.txt

These are all identical and are pointing out that vmd, via
pci_reset_bus(), has long been performing an unlocked secondary bus
reset that userspace could race and confuse the kernel.

I think the fix for that is below, but this is an increasingly spicy
level of change that gives me some pause, i.e. teach pci_bus_lock() to
lock the bridge itself:

-- 8< --
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 59e0949fb079..ac3999bc59e8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5442,6 +5442,7 @@ static void pci_bus_lock(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
 
+	pci_dev_lock(bus->self);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		pci_dev_lock(dev);
 		if (dev->subordinate)
@@ -5459,6 +5460,7 @@ static void pci_bus_unlock(struct pci_bus *bus)
 			pci_bus_unlock(dev->subordinate);
 		pci_dev_unlock(dev);
 	}
+	pci_dev_unlock(bus->self);
 }
 
 /* Return 1 on successful lock, 0 on contention */
@@ -5466,6 +5468,7 @@ static int pci_bus_trylock(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
 
+	pci_dev_lock(bus->self);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		if (!pci_dev_trylock(dev))
 			goto unlock;
@@ -5484,6 +5487,7 @@ static int pci_bus_trylock(struct pci_bus *bus)
 			pci_bus_unlock(dev->subordinate);
 		pci_dev_unlock(dev);
 	}
+	pci_dev_unlock(bus->self);
 	return 0;
 }
 


More information about the Intel-gfx mailing list