[PATCH 2/2] drm/xe/guc: Explicitly exit CT safe mode on unwind
Rodrigo Vivi
rodrigo.vivi at intel.com
Mon Jun 23 21:29:21 UTC 2025
On Fri, Jun 13, 2025 at 12:09:37AM +0200, Michal Wajdeczko wrote:
> During driver probe we might be briefly using CT safe mode, which
> is based on a delayed work, but usually we are able to stop this
> once we have IRQ fully operational. However, if we abort the probe
> quite early then during unwind we might try to destroy the workqueue
> while there is still a pending delayed work that attempts to restart
> itself which triggers a WARN.
>
> This was recently observed during unsuccessful VF initialization:
>
> [ ] xe 0000:00:02.1: probe with driver xe failed with error -62
> [ ] ------------[ cut here ]------------
> [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq
> [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710
> [ ] RIP: 0010:__queue_work+0x287/0x710
> [ ] Call Trace:
> [ ] delayed_work_timer_fn+0x19/0x30
> [ ] call_timer_fn+0xa1/0x2a0
>
> Exit the CT safe mode on unwind to avoid that warning.
>
> Fixes: 09b286950f29 ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ")
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc_ct.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 822f4c33f730..6e353757e204 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -35,6 +35,11 @@
> #include "xe_pm.h"
> #include "xe_trace_guc.h"
>
> +static void receive_g2h(struct xe_guc_ct *ct);
> +static void g2h_worker_func(struct work_struct *w);
> +static void safe_mode_worker_func(struct work_struct *w);
> +static void ct_exit_safe_mode(struct xe_guc_ct *ct);
> +
> #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> enum {
> /* Internal states, not error conditions */
> @@ -189,14 +194,11 @@ static void guc_ct_fini(struct drm_device *drm, void *arg)
> {
> struct xe_guc_ct *ct = arg;
>
> + ct_exit_safe_mode(ct);
I was going to ask if we also don't need to disable more of the ct
at this point, but I believe it is indeed not relevant at this point
of the code.
But it is any possibility of double call of this function in some of
the regular removal paths?
> destroy_workqueue(ct->g2h_wq);
> xa_destroy(&ct->fence_lookup);
> }
>
> -static void receive_g2h(struct xe_guc_ct *ct);
> -static void g2h_worker_func(struct work_struct *w);
> -static void safe_mode_worker_func(struct work_struct *w);
> -
> static void primelockdep(struct xe_guc_ct *ct)
> {
> if (!IS_ENABLED(CONFIG_LOCKDEP))
> --
> 2.47.1
>
More information about the Intel-xe
mailing list