[PATCH 2/2] drm/xe/guc: Explicitly exit CT safe mode on unwind

Rodrigo Vivi rodrigo.vivi at intel.com
Tue Jun 24 21:30:19 UTC 2025


On Tue, Jun 24, 2025 at 08:32:29AM -0700, Matthew Brost wrote:
> On Mon, Jun 23, 2025 at 05:29:21PM -0400, Rodrigo Vivi wrote:
> > On Fri, Jun 13, 2025 at 12:09:37AM +0200, Michal Wajdeczko wrote:
> > > During driver probe we might be briefly using CT safe mode, which
> > > is based on a delayed work, but usually we are able to stop this
> > > once we have IRQ fully operational.  However, if we abort the probe
> > > quite early then during unwind we might try to destroy the workqueue
> > > while there is still a pending delayed work that attempts to restart
> > > itself which triggers a WARN.
> > > 
> > > This was recently observed during unsuccessful VF initialization:
> > > 
> > >  [ ] xe 0000:00:02.1: probe with driver xe failed with error -62
> > >  [ ] ------------[ cut here ]------------
> > >  [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq
> > >  [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710
> > >  [ ] RIP: 0010:__queue_work+0x287/0x710
> > >  [ ] Call Trace:
> > >  [ ]  delayed_work_timer_fn+0x19/0x30
> > >  [ ]  call_timer_fn+0xa1/0x2a0
> > > 
> > > Exit the CT safe mode on unwind to avoid that warning.
> > > 
> > > Fixes: 09b286950f29 ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ")
> > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_guc_ct.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index 822f4c33f730..6e353757e204 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -35,6 +35,11 @@
> > >  #include "xe_pm.h"
> > >  #include "xe_trace_guc.h"
> > >  
> > > +static void receive_g2h(struct xe_guc_ct *ct);
> > > +static void g2h_worker_func(struct work_struct *w);
> > > +static void safe_mode_worker_func(struct work_struct *w);
> > > +static void ct_exit_safe_mode(struct xe_guc_ct *ct);
> > > +
> > >  #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > >  enum {
> > >  	/* Internal states, not error conditions */
> > > @@ -189,14 +194,11 @@ static void guc_ct_fini(struct drm_device *drm, void *arg)
> > >  {
> > >  	struct xe_guc_ct *ct = arg;
> > >  
> > > +	ct_exit_safe_mode(ct);
> > 
> > I was going to ask if we also don't need to disable more of the ct
> > at this point, but I believe it is indeed not relevant at this point
> > of the code.
> > 
> > But it is any possibility of double call of this function in some of
> > the regular removal paths?
> > 
> 
> ct_exit_safe_mode is safe to be called many times - it just cancels a
> delayed worker.
> 
> The code looks correct to me.
> Reviewed-by: Matthew Brost <matthew.brost at intel.com>

I was going to push this right now, but it looks it needs a rebase now...

> 
> > >  	destroy_workqueue(ct->g2h_wq);
> > >  	xa_destroy(&ct->fence_lookup);
> > >  }
> > >  
> > > -static void receive_g2h(struct xe_guc_ct *ct);
> > > -static void g2h_worker_func(struct work_struct *w);
> > > -static void safe_mode_worker_func(struct work_struct *w);
> > > -
> > >  static void primelockdep(struct xe_guc_ct *ct)
> > >  {
> > >  	if (!IS_ENABLED(CONFIG_LOCKDEP))
> > > -- 
> > > 2.47.1
> > > 


More information about the Intel-xe mailing list