[PATCH 2/2] drm/xe/guc: Explicitly exit CT safe mode on unwind

Rodrigo Vivi rodrigo.vivi at intel.com
Mon Jun 23 21:29:21 UTC 2025


On Fri, Jun 13, 2025 at 12:09:37AM +0200, Michal Wajdeczko wrote:
> During driver probe we might be briefly using CT safe mode, which
> is based on a delayed work, but usually we are able to stop this
> once we have IRQ fully operational.  However, if we abort the probe
> quite early then during unwind we might try to destroy the workqueue
> while there is still a pending delayed work that attempts to restart
> itself which triggers a WARN.
> 
> This was recently observed during unsuccessful VF initialization:
> 
>  [ ] xe 0000:00:02.1: probe with driver xe failed with error -62
>  [ ] ------------[ cut here ]------------
>  [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq
>  [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710
>  [ ] RIP: 0010:__queue_work+0x287/0x710
>  [ ] Call Trace:
>  [ ]  delayed_work_timer_fn+0x19/0x30
>  [ ]  call_timer_fn+0xa1/0x2a0
> 
> Exit the CT safe mode on unwind to avoid that warning.
> 
> Fixes: 09b286950f29 ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ")
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_ct.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 822f4c33f730..6e353757e204 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -35,6 +35,11 @@
>  #include "xe_pm.h"
>  #include "xe_trace_guc.h"
>  
> +static void receive_g2h(struct xe_guc_ct *ct);
> +static void g2h_worker_func(struct work_struct *w);
> +static void safe_mode_worker_func(struct work_struct *w);
> +static void ct_exit_safe_mode(struct xe_guc_ct *ct);
> +
>  #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>  enum {
>  	/* Internal states, not error conditions */
> @@ -189,14 +194,11 @@ static void guc_ct_fini(struct drm_device *drm, void *arg)
>  {
>  	struct xe_guc_ct *ct = arg;
>  
> +	ct_exit_safe_mode(ct);

I was going to ask if we also don't need to disable more of the ct
at this point, but I believe it is indeed not relevant at this point
of the code.

But it is any possibility of double call of this function in some of
the regular removal paths?

>  	destroy_workqueue(ct->g2h_wq);
>  	xa_destroy(&ct->fence_lookup);
>  }
>  
> -static void receive_g2h(struct xe_guc_ct *ct);
> -static void g2h_worker_func(struct work_struct *w);
> -static void safe_mode_worker_func(struct work_struct *w);
> -
>  static void primelockdep(struct xe_guc_ct *ct)
>  {
>  	if (!IS_ENABLED(CONFIG_LOCKDEP))
> -- 
> 2.47.1
> 


More information about the Intel-xe mailing list