[PATCH 4/9] drm/xe: Move xe_irq runtime suspend and resume out of lockdep

Matthew Auld matthew.auld at intel.com
Tue Mar 5 11:07:37 UTC 2024


On 04/03/2024 18:21, Rodrigo Vivi wrote:
> Now that mem_access xe_pm_runtime_lockdep_map was moved to protect all
> the sync resume calls lockdep is saying:
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(xe_pm_runtime_lockdep_map);
>                                 lock(&power_domains->lock);
>                                 lock(xe_pm_runtime_lockdep_map);
>    lock(&power_domains->lock);
> 
> -> #1 (xe_pm_runtime_lockdep_map){+.+.}-{0:0}:
>         xe_pm_runtime_resume_and_get+0x6a/0x190 [xe]
>         release_async_put_domains+0x26/0xa0 [xe]
>         intel_display_power_put_async_work+0xcb/0x1f0 [xe]
> 
> -> #0 (&power_domains->lock){+.+.}-{4:4}:
>         __lock_acquire+0x3259/0x62c0
>         lock_acquire+0x19b/0x4c0
>         __mutex_lock+0x16b/0x1a10
>         intel_display_power_is_enabled+0x1f/0x40 [xe]
>         gen11_display_irq_reset+0x1f2/0xcc0 [xe]
>         xe_irq_reset+0x43d/0x1cb0 [xe]
>         xe_irq_resume+0x52/0x660 [xe]
>         xe_pm_runtime_resume+0x7d/0xdc0 [xe
> 
> This is likely a false positive.
> 
> This lockdep is created to protect races from the inner callers

There is no real lock here so it doesn't protect anything AFAIK. It is 
just about mapping the hidden dependencies between locks held when 
waking up the device and locks acquired in the resume and suspend callbacks.

> of get-and-resume-sync that are within holding various memory access locks
> with the resume and suspend itself that can also be trying to grab these
> memory access locks.
> 
> This is not the case here, for sure. The &power_domains->lock seems to be
> sufficient to protect any race and there's no counter part to get deadlocked
> with.

What is meant by "race" here? The lockdep splat is saying that one or 
both of the resume or suspend callbacks is grabbing some lock, but that 
same lock is also held when potentially waking up the device. From 
lockdep POV that is a potential deadlock.

If we are saying that it is impossible to actually wake up the device in 
this particular case then can we rather make caller use _noresume() or 
ifactive()?

> 
> Also worth to mention that on i915, intel_display_power_put_async_work
> also gets and resume synchronously and the runtime pm get/put
> also resets the irq and that code was never problematic.
> 
> Cc: Matthew Auld <matthew.auld at intel.com>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
>   drivers/gpu/drm/xe/xe_pm.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index b534a194a9ef..919250e38ae0 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -347,7 +347,10 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
>   			goto out;
>   	}
>   
> +	lock_map_release(&xe_pm_runtime_lockdep_map);
>   	xe_irq_suspend(xe);
> +	xe_pm_write_callback_task(xe, NULL);
> +	return 0;
>   out:
>   	lock_map_release(&xe_pm_runtime_lockdep_map);
>   	xe_pm_write_callback_task(xe, NULL);
> @@ -369,6 +372,8 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>   	/* Disable access_ongoing asserts and prevent recursive pm calls */
>   	xe_pm_write_callback_task(xe, current);
>   
> +	xe_irq_resume(xe);
> +
>   	lock_map_acquire(&xe_pm_runtime_lockdep_map);
>   
>   	/*
> @@ -395,8 +400,6 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>   			goto out;
>   	}
>   
> -	xe_irq_resume(xe);
> -
>   	for_each_gt(gt, xe, id)
>   		xe_gt_resume(gt);
>   


More information about the Intel-xe mailing list