[PATCH] drm/xe: Suppress missing outer rpm protection warning
Cavitt, Jonathan
jonathan.cavitt at intel.com
Wed Sep 4 17:26:47 UTC 2024
-----Original Message-----
From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Rodrigo Vivi
Sent: Wednesday, September 4, 2024 9:51 AM
To: intel-xe at lists.freedesktop.org
Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>; Brost, Matthew <matthew.brost at intel.com>; Auld, Matthew <matthew.auld at intel.com>
Subject: [PATCH] drm/xe: Suppress missing outer rpm protection warning
>
> Do not raise a WARN if we are likely within suspending or resuming
> path. This is likely this false positive:
>
> rpm_status: 0000:03:00.0 status=RPM_SUSPENDING
> console: xe_bo_evict_all (called from suspend)
> xe_sched_job_create: dev=0000:03:00.0, ...
> xe_sched_job_exec: dev=0000:03:00.0, ...
> xe_pm_runtime_put: dev=0000:03:00.0, ...
> xe_sched_job_run: dev=0000:03:00.0, ...
> rpm_usage: 0000:03:00.0 flags-0 cnt-2 ...
> rpm_usage: 0000:03:00.0 flags-0 cnt-2 ...
> rpm_usage: 0000:03:00.0 flags-0 cnt-2 ...
> console: xe 0000:03:00.0: [drm] Missing outer runtime
> PM protection
> console: xe_guc_ct_send+0x15/0x50 [xe]
> console: guc_exec_queue_run_job+0x1509/0x3950 [xe]
> [snip]
> console: drm_sched_run_job_work+0x649/0xc20
>
> At this point, BOs are getting evicted from VRAM with rpm
> usage-counter = 2, but rpm status = SUSPENDING.
>
> The xe->pm_callback_task won't be equal 'current' because this call is
> coming from a work queue.
>
> So, pm_runtime_get_if_active() will be called and return 0 because rpm
> status != ACTIVE (but equal SUSPENDING or RESUMING).
>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Matthew Auld <matthew.auld at intel.com>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
> drivers/gpu/drm/xe/xe_pm.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index da68cd689a96..352f9d593496 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -588,6 +588,18 @@ bool xe_pm_runtime_get_if_in_use(struct xe_device *xe)
> return pm_runtime_get_if_in_use(xe->drm.dev) > 0;
> }
>
> +/*
> + * Very unreliable! Should only be used to suppress the false positive case
> + * in the missing outer rpm protection warning.
> + */
> +static bool xe_pm_suspending_or_resuming(struct xe_device *xe)
> +{
> + struct device *dev = xe->drm.dev;
> +
> + return dev->power.runtime_status == RPM_SUSPENDING ||
> + dev->power.runtime_status == RPM_RESUMING;
> +}
> +
> /**
> * xe_pm_runtime_get_noresume - Bump runtime PM usage counter without resuming
> * @xe: xe device instance
> @@ -604,7 +616,8 @@ void xe_pm_runtime_get_noresume(struct xe_device *xe)
>
> ref = xe_pm_runtime_get_if_in_use(xe);
>
> - if (drm_WARN(&xe->drm, !ref, "Missing outer runtime PM protection\n"))
> + if (drm_WARN(&xe->drm, !ref && !xe_pm_suspending_or_resuming(xe),
> + "Missing outer runtime PM protection\n"))
I don't know for certain, but I think we want to run pm_runtime_get_noresume if
xe_pm_runtime_get_if_in_use fails, regardless of the pm suspend/resume status.
Something like:
"""
if (!ret) {
drm_WARN(&xe->drm, !xe_pm_suspending_or_resuming(xe),
"Missing outer runtime PM protection\n");
pm_runtime_get_noresume(xe->drm.dev);
}
"""
I at least think it's worth double-checking.
Otherwise:
Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
-Jonathan Cavitt
> pm_runtime_get_noresume(xe->drm.dev);
> }
>
> --
> 2.46.0
>
>
More information about the Intel-xe
mailing list