SRIOV RPM resume lockdep issues

Hellstrom, Thomas thomas.hellstrom at intel.com
Mon Aug 26 12:39:57 UTC 2024


Hi, Michal,

Since we want to be able to wake LNL from shrinking, I'm trying to add
lockdep annotation to verify that we don't do anything in the rpm
resume and suspend callbacks that blocks this.

However, it seems SRIOV is doing just that.

https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-137730v3/shard-adlp-6/igt@sriov_basic@enable-vfs-bind-unbind-each-numvfs-all.html#dmesg-warnings39

More specifically, SRIOV grabs the xe->sriov.pf.master_lock during
rpm resume, and later is doing memory allocations under that lock, so
that a theoretical deadlock sequence could look like:

1) lock xe->sriov.pf.master_lock
2) Allocate memory
3) Enter shrinker
4) RPM resume
5) lock xe->sriov.pf.master_lock

Now, LNL has not enabled SRIOV (yet) so I can work around this for now,
but moving forward this can cause problems.

Is there a way to avoid grabbing that lock during rpm suspend / resume,
or perhaps avoid doing memory allocation under it? In this case it
appears from the trace that a bo is allocated under the lock.

Thanks,
Thomas



More information about the Intel-xe mailing list