[PATCH 3/3] drm/xe/pf: Don't resume device from restart worker

Cavitt, Jonathan jonathan.cavitt at intel.com
Fri Aug 1 22:55:08 UTC 2025


-----Original Message-----
From: Vivi, Rodrigo <rodrigo.vivi at intel.com> 
Sent: Friday, August 1, 2025 1:46 PM
To: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>
Cc: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
Subject: Re: [PATCH 3/3] drm/xe/pf: Don't resume device from restart worker
> 
> On Wed, Jul 30, 2025 at 11:48:01PM +0200, Michal Wajdeczko wrote:
> > 
> > 
> > On 7/30/2025 11:10 PM, Cavitt, Jonathan wrote:
> > > -----Original Message-----
> > > From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Michal Wajdeczko
> > > Sent: Wednesday, July 30, 2025 10:49 AM
> > > To: intel-xe at lists.freedesktop.org
> > > Cc: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>
> > > Subject: [PATCH 3/3] drm/xe/pf: Don't resume device from restart worker
> > >>
> > >> The PF's restart worker shouldn't attempt to resume the device on
> > >> its own, since its goal is to finish PF and VFs reprovisioning on
> > >> the recently reset GuC. Take extra RPM reference while scheduling
> > >> a work and release it from the worker or when we cancel a work.
> > >>
> > >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > >> ---
> > >>  drivers/gpu/drm/xe/xe_gt_sriov_pf.c | 23 +++++++++++++++++++----
> > >>  1 file changed, 19 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> > >> index 8bc7d7f9f47a..0c9012fb625d 100644
> > >> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> > >> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
> > >> @@ -53,7 +53,11 @@ static void pf_init_workers(struct xe_gt *gt)
> > >>  
> > >>  static void pf_fini_workers(struct xe_gt *gt)
> > >>  {
> > >> -	disable_work_sync(&gt->sriov.pf.workers.restart);
> > >> +	if (disable_work_sync(&gt->sriov.pf.workers.restart)) {
> > >> +		xe_gt_sriov_dbg_verbose(gt, "pending restart disabled!\n");
> > >> +		/* release a rpm reference taken on the worker behalf */
> > >> +		xe_pm_runtime_put(gt_to_xe(gt));
> > >> +	}
> > >>  }
> > >>  
> > >>  /**
> > >> @@ -205,8 +209,11 @@ static void pf_cancel_restart(struct xe_gt *gt)
> > >>  {
> > >>  	xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt)));
> > >>  
> > >> -	if (cancel_work_sync(&gt->sriov.pf.workers.restart))
> > >> +	if (cancel_work_sync(&gt->sriov.pf.workers.restart)) {
> > >>  		xe_gt_sriov_dbg_verbose(gt, "pending restart canceled!\n");
> > >> +		/* release a rpm reference taken on the worker behalf */
> > >> +		xe_pm_runtime_put(gt_to_xe(gt));
> > >> +	}
> > >>  }
> > >>  
> > >>  /**
> > >> @@ -224,9 +231,12 @@ static void pf_restart(struct xe_gt *gt)
> > >>  {
> > >>  	struct xe_device *xe = gt_to_xe(gt);
> > >>  
> > >> -	xe_pm_runtime_get(xe);
> > >> +	xe_gt_assert(gt, !xe_pm_runtime_suspended(xe));
> > >> +
> > >>  	xe_gt_sriov_pf_config_restart(gt);
> > >>  	xe_gt_sriov_pf_control_restart(gt);
> > >> +
> > >> +	/* release a rpm reference taken on our behalf */
> > > 
> > > NIT:
> > > For consistency with the other two comments, maybe:
> > > s/our/the worker
> > > Or is the pm reference taken in this instance different from the pm reference
> > > taken in pf_cancel_restart and pf_fini_workers?
> > 
> > this is the worker context, hence 'our'
> 
> I honestly prefer the non personal statements. But not an issue
> 
> > 
> > > 
> > > There're also some other minor grammar things ("s/a rpm/an rpm" and 
> > > "s/worker behalf/worker's behalf", for example) that can be applied more
> > > generally to the whole patch.
> > > 
> > > I'm not going to block on minor grammatical fixups, though, so:
> 
> My english is broken so I usually miss things like this. I even just learned
> that the 'an rpm' is the right one...
> 
> But anyway, if you notice grammatical issues, please refrain on giving
> your reviewed-by. Otherwise we will need to keep accepting small patches
> later with small grammar fixes :/

This seems like unnecessary ruler-smacking to me, especially given the multi-lingual
nature of the mailing list, but if you're adamant that I should be blocking on grammar
fixes, then I guess I'll apply this recommendation in the future.

-Jonathan Cavitt

> 
> Please ask the author a new revision and only put your rv-b when you are
> comfortable that we are not leaving things behind.
> 
> > > Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > > -Jonathan Cavitt
> > > 
> > >>  	xe_pm_runtime_put(xe);
> > >>  
> > >>  	xe_gt_sriov_dbg(gt, "restart completed\n");
> > >> @@ -245,8 +255,13 @@ static void pf_queue_restart(struct xe_gt *gt)
> > >>  
> > >>  	xe_gt_assert(gt, IS_SRIOV_PF(xe));
> > >>  
> > >> -	if (!queue_work(xe->sriov.wq, &gt->sriov.pf.workers.restart))
> > >> +	/* take a rpm reference on behalf of the worker */
> > >> +	xe_pm_runtime_get_noresume(xe);
> > >> +
> > >> +	if (!queue_work(xe->sriov.wq, &gt->sriov.pf.workers.restart)) {
> > >>  		xe_gt_sriov_dbg(gt, "restart already in queue!\n");
> > >> +		xe_pm_runtime_put(xe);
> 
> As for the patch itself from the power management perspective, the approach
> looks good to me. It is the right thing to do if we don't want to let the
> device to suspend and possibly losing power between the scheduling and
> the work is done. However the hard part is to ensure that all the work
> cancellation is taken care properly, what I believe you did right.
> 
> But it is important that the reviewers checked for other possible missed
> spots here. I hope this is the case.
> 
> Thanks,
> Rodrigo.
> 
> > >> +	}
> > >>  }
> > >>  
> > >>  /**
> > >> -- 
> > >> 2.47.1
> > >>
> > >>
> > 
> 


More information about the Intel-xe mailing list