SRIOV RPM resume lockdep issues
Hellstrom, Thomas
thomas.hellstrom at intel.com
Mon Aug 26 12:39:57 UTC 2024
Hi, Michal,
Since we want to be able to wake LNL from shrinking, I'm trying to add
lockdep annotation to verify that we don't do anything in the rpm
resume and suspend callbacks that blocks this.
However, it seems SRIOV is doing just that.
https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-137730v3/shard-adlp-6/igt@sriov_basic@enable-vfs-bind-unbind-each-numvfs-all.html#dmesg-warnings39
More specifically, SRIOV grabs the xe->sriov.pf.master_lock during
rpm resume, and later is doing memory allocations under that lock, so
that a theoretical deadlock sequence could look like:
1) lock xe->sriov.pf.master_lock
2) Allocate memory
3) Enter shrinker
4) RPM resume
5) lock xe->sriov.pf.master_lock
Now, LNL has not enabled SRIOV (yet) so I can work around this for now,
but moving forward this can cause problems.
Is there a way to avoid grabbing that lock during rpm suspend / resume,
or perhaps avoid doing memory allocation under it? In this case it
appears from the trace that a bo is allocated under the lock.
Thanks,
Thomas
More information about the Intel-xe
mailing list