[PATCH] drm/xe/pm: Move xe_rpm_lockmap_acquire
Kandpal, Suraj
suraj.kandpal at intel.com
Wed Sep 11 10:58:52 UTC 2024
> -----Original Message-----
> From: Auld, Matthew <matthew.auld at intel.com>
> Sent: Wednesday, September 11, 2024 4:11 PM
> To: Kandpal, Suraj <suraj.kandpal at intel.com>; intel-xe at lists.freedesktop.org
> Cc: Shankar, Uma <uma.shankar at intel.com>; Vivi, Rodrigo
> <rodrigo.vivi at intel.com>
> Subject: Re: [PATCH] drm/xe/pm: Move xe_rpm_lockmap_acquire
>
> On 11/09/2024 10:30, Suraj Kandpal wrote:
> > Move xe_rpm_lockmap_acquire after display_pm_suspend and resume
> > funtions to avoid cirular locking dependency because of locks being
> > taken in intel_fbdev, intel_dp_mst_mgr suspend and resume functions.
> >
> > Signed-off-by: Suraj Kandpal <suraj.kandpal at intel.com>
>
> Can you provide the full lockdep splat or a link to it? We also need to give some
> solid analysis on why we think the splat is a false positive.
Sure Mathew
Find the link below there are two different splats one with fbdev and one with drm_dp_mst_mgr
Both with the same root cause
https://gfx-ci.igk.intel.com/cibuglog-ng/issue/13490?query_key=4eadcfe577e7b807f83bf5a9031dec11ae123b77
1) https://intel-gfx-ci.01.org/tree/intel-xe/xe-1775-f0a6824d9e4ba2e1beabab4e3eeb195aa7fea167/re-dg2-17/igt@xe_pm@d3cold-mmap-vram.html#dmesg-warnings266
2) https://intel-gfx-ci.01.org/tree/intel-xe/xe-1924-f3eded4f8a05d73a0b94f27e05737ea3427450b3/re-dg2-15/igt@xe_pm@d3cold-mmap-system.html#dmesg-warnings172
History of the issue can be found on
VLK-61411
From what I saw it was being caused because the lock dep map ended up taking all the locks that are
Later required to even the resume say mst config which takes mgr->lock in drm_dp_mst_msgr_suspend and resume
Functions. And hw_mutex lock which is taken when accessing dpcd in drm_dp_dpcd access.
>
> > ---
> > drivers/gpu/drm/xe/xe_pm.c | 28 ++++++++++++++--------------
> > 1 file changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index a3d1509066f7..7f33e553728a 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -363,6 +363,18 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
> > /* Disable access_ongoing asserts and prevent recursive pm calls */
> > xe_pm_write_callback_task(xe, current);
> >
> > + /*
> > + * Applying lock for entire list op as xe_ttm_bo_destroy and
> xe_bo_move_notify
> > + * also checks and delets bo entry from user fault list.
> > + */
> > + mutex_lock(&xe->mem_access.vram_userfault.lock);
> > + list_for_each_entry_safe(bo, on,
> > + &xe->mem_access.vram_userfault.list,
> vram_userfault_link)
> > + xe_bo_runtime_pm_release_mmap_offset(bo);
> > + mutex_unlock(&xe->mem_access.vram_userfault.lock);
> > +
> > + xe_display_pm_runtime_suspend(xe);
> > +
> > /*
> > * The actual xe_pm_runtime_put() is always async underneath, so
> > * exactly where that is called should makes no difference to us.
> > However @@ -386,18 +398,6 @@ int xe_pm_runtime_suspend(struct
> xe_device *xe)
> > */
> > xe_rpm_lockmap_acquire(xe);
> >
> > - /*
> > - * Applying lock for entire list op as xe_ttm_bo_destroy and
> xe_bo_move_notify
> > - * also checks and delets bo entry from user fault list.
> > - */
> > - mutex_lock(&xe->mem_access.vram_userfault.lock);
> > - list_for_each_entry_safe(bo, on,
> > - &xe->mem_access.vram_userfault.list,
> vram_userfault_link)
> > - xe_bo_runtime_pm_release_mmap_offset(bo);
> > - mutex_unlock(&xe->mem_access.vram_userfault.lock);
> > -
> > - xe_display_pm_runtime_suspend(xe);
> > -
> > if (xe->d3cold.allowed) {
> > err = xe_bo_evict_all(xe);
> > if (err)
> > @@ -438,8 +438,6 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > /* Disable access_ongoing asserts and prevent recursive pm calls */
> > xe_pm_write_callback_task(xe, current);
> >
> > - xe_rpm_lockmap_acquire(xe);
> > -
> > if (xe->d3cold.allowed) {
> > err = xe_pcode_ready(xe, true);
> > if (err)
> > @@ -463,6 +461,8 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> >
> > xe_display_pm_runtime_resume(xe);
> >
> > + xe_rpm_lockmap_acquire(xe);
> > +
> > if (xe->d3cold.allowed) {
> > err = xe_bo_restore_user(xe);
> > if (err)
More information about the Intel-xe
mailing list