[PATCH] drm/xe: Kill missing outer runtime PM protection warning

Rodrigo Vivi rodrigo.vivi at intel.com
Tue Sep 3 22:38:22 UTC 2024


This message was very useful to ensure that Xe was taking all
the needed outer runtime pm references. However, at this point
it is only a false positive. So, remove it.

False positive cases:

1:
[184.983389] xe ...: [drm] Missing outer runtime PM protection
[snip]
[184.984096]  drm_ioctl+0x2cf/0x580 [drm]
[snip]
[184.984710] xe 0000:00:02.0: Runtime PM usage count underflow!

In this case the underflow is the problem since we are sure that
the ioctl is protected. But something else is abusing the 'put'
calls.

2:
rpm_status:           0000:03:00.0 status=RPM_SUSPENDING
console:              xe_bo_evict_all (called from suspend)
xe_sched_job_create:  dev=0000:03:00.0, ...
xe_sched_job_exec:    dev=0000:03:00.0, ...
xe_pm_runtime_put:    dev=0000:03:00.0, ...
xe_sched_job_run:     dev=0000:03:00.0, ...
rpm_usage:            0000:03:00.0 flags-0 cnt-2  ...
rpm_usage:            0000:03:00.0 flags-0 cnt-2  ...
rpm_usage:            0000:03:00.0 flags-0 cnt-2  ...
console:              xe 0000:03:00.0: [drm] Missing outer runtime
		      	 	       	     	     PM protection
console:               xe_guc_ct_send+0x15/0x50 [xe]
console:               guc_exec_queue_run_job+0x1509/0x3950 [xe]
[snip]
console:               drm_sched_run_job_work+0x649/0xc20

At this point, BOs are getting evicted from VRAM with rpm
usage-counter = 2, but rpm status = SUSPENDING.
The xe->pm_callback_task won't be equal 'current' because this call is
coming from a work queue.

So, pm_runtime_get_if_active() will be called and return 0 because rpm
status != ACTIVE (but equal SUSPENDING).

The only way out is to just grab the reference and move on.

Cc: Matthew Brost <matthew.brost at intel.com>
Cc: Matthew Auld <matthew.auld at intel.com>
Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index da68cd689a96..e1a5e43b0f34 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -592,20 +592,14 @@ bool xe_pm_runtime_get_if_in_use(struct xe_device *xe)
  * xe_pm_runtime_get_noresume - Bump runtime PM usage counter without resuming
  * @xe: xe device instance
  *
- * This function should be used in inner places where it is surely already
+ * This function should *only* be used in inner places where it is surely already
  * protected by outer-bound callers of `xe_pm_runtime_get`.
- * It will warn if not protected.
  * The reference should be put back after this function regardless, since it
  * will always bump the usage counter, regardless.
  */
 void xe_pm_runtime_get_noresume(struct xe_device *xe)
 {
-	bool ref;
-
-	ref = xe_pm_runtime_get_if_in_use(xe);
-
-	if (drm_WARN(&xe->drm, !ref, "Missing outer runtime PM protection\n"))
-		pm_runtime_get_noresume(xe->drm.dev);
+	pm_runtime_get_noresume(xe->drm.dev);
 }
 
 /**
-- 
2.46.0



More information about the Intel-xe mailing list