[PATCH v6] drm/xe/uc: Disable GuC communication on hardware initialization error.

Cavitt, Jonathan jonathan.cavitt at intel.com
Tue Jun 24 22:17:48 UTC 2025


-----Original Message-----
From: Wajdeczko, Michal <Michal.Wajdeczko at intel.com> 
Sent: Tuesday, June 24, 2025 1:15 PM
To: Dong, Zhanjun <zhanjun.dong at intel.com>; intel-xe at lists.freedesktop.org
Cc: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Summers, Stuart <stuart.summers at intel.com>
Subject: Re: [PATCH v6] drm/xe/uc: Disable GuC communication on hardware initialization error.
> 
> On 24.06.2025 03:17, Zhanjun Dong wrote:
> > Disable GuC communication on Xe micro controller hardware initialization
> > error.
> > 
> > Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> > Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4917
> > 
> > ---
> > Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > Cc: Stuart Summers <stuart.summers at intel.com>
> > Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > 
> > Change list:
> > v6: Skip disable ct on xe_guc_enable_communication error
> > v5: Set wedge is excessive action, revert back to disable ct
> > v4: Fix typo and add new line
> > v3: v2 CI re-run
> > v2: Remove unnecessary jump to err-out
> >     Drop disable ct, switch to set wedge
> > ---
> >  drivers/gpu/drm/xe/xe_guc.c |  5 +++++
> >  drivers/gpu/drm/xe/xe_guc.h |  1 +
> >  drivers/gpu/drm/xe/xe_uc.c  | 19 ++++++++++++++-----
> >  3 files changed, 20 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> > index 209e5d53c290..9d7175b16cc7 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.c
> > +++ b/drivers/gpu/drm/xe/xe_guc.c
> > @@ -1230,6 +1230,11 @@ int xe_guc_enable_communication(struct xe_guc *guc)
> >  	return 0;
> >  }
> >  
> 
> each new public function needs to have a proper kernel-doc
> 
> > +void xe_guc_disable_communication(struct xe_guc *guc)
> > +{
> > +	xe_guc_ct_disable(&guc->ct);
> > +}
> > +
> >  int xe_guc_suspend(struct xe_guc *guc)
> >  {
> >  	struct xe_gt *gt = guc_to_gt(guc);
> > diff --git a/drivers/gpu/drm/xe/xe_guc.h b/drivers/gpu/drm/xe/xe_guc.h
> > index 58338be44558..285c19929f8c 100644
> > --- a/drivers/gpu/drm/xe/xe_guc.h
> > +++ b/drivers/gpu/drm/xe/xe_guc.h
> > @@ -33,6 +33,7 @@ int xe_guc_reset(struct xe_guc *guc);
> >  int xe_guc_upload(struct xe_guc *guc);
> >  int xe_guc_min_load_for_hwconfig(struct xe_guc *guc);
> >  int xe_guc_enable_communication(struct xe_guc *guc);
> > +void xe_guc_disable_communication(struct xe_guc *guc);
> >  int xe_guc_suspend(struct xe_guc *guc);
> >  void xe_guc_notify(struct xe_guc *guc);
> >  int xe_guc_auth_huc(struct xe_guc *guc, u32 rsa_addr);
> > diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
> > index 3a8751a8b92d..d74bfc7a85d1 100644
> > --- a/drivers/gpu/drm/xe/xe_uc.c
> > +++ b/drivers/gpu/drm/xe/xe_uc.c
> > @@ -13,6 +13,7 @@
> >  #include "xe_gt_printk.h"
> >  #include "xe_gt_sriov_vf.h"
> >  #include "xe_guc.h"
> > +#include "xe_guc_ct.h"
> >  #include "xe_guc_pc.h"
> >  #include "xe_guc_engine_activity.h"
> >  #include "xe_huc.h"
> > @@ -161,15 +162,19 @@ static int vf_uc_init_hw(struct xe_uc *uc)
> >  
> >  	err = xe_gt_sriov_vf_connect(uc_to_gt(uc));
> >  	if (err)
> > -		return err;
> > +		goto err_out;
> >  
> >  	uc->guc.submission_state.enabled = true;
> >  
> >  	err = xe_gt_record_default_lrcs(uc_to_gt(uc));
> >  	if (err)
> > -		return err;
> > +		goto err_out;
> >  
> >  	return 0;
> > +
> > +err_out:
> > +	xe_guc_disable_communication(&uc->guc);
> > +	return err;
> >  }
> >  
> >  /*
> > @@ -201,15 +206,15 @@ int xe_uc_init_hw(struct xe_uc *uc)
> >  
> >  	ret = xe_gt_record_default_lrcs(uc_to_gt(uc));
> >  	if (ret)
> > -		return ret;
> > +		goto err_out;
> >  
> >  	ret = xe_guc_post_load_init(&uc->guc);
> >  	if (ret)
> > -		return ret;
> > +		goto err_out;
> >  
> >  	ret = xe_guc_pc_start(&uc->guc.pc);
> >  	if (ret)
> > -		return ret;
> > +		goto err_out;
> >  
> >  	xe_guc_engine_activity_enable_stats(&uc->guc);
> >  
> > @@ -221,6 +226,10 @@ int xe_uc_init_hw(struct xe_uc *uc)
> >  	xe_gsc_load_start(&uc->gsc);
> >  
> >  	return 0;
> > +
> > +err_out:
> > +	xe_guc_disable_communication(&uc->guc);
> > +	return ret;
> >  }
> >  
> >  int xe_uc_fini_hw(struct xe_uc *uc)
> 
> what about doing similar cleanup in xe_guc_min_load_for_hwconfig() where
> we also have unbalanced xe_guc_enable_communication() ?
> 

I don't think it's necessary there, as xe_guc_enable_communication is the last function run in
xe_guc_min_load_for_hwconfig.
If it fails, then the GuC CT was never enabled, so it doesn't need to be disabled.  And if it succeeds,
then I think the rest of the Xe micro controller hardware initialization will have succeeded as well,
making it unnecessary to disable the GuC CT afterwards as that component at least should be
functioning properly.
Though, maybe we need it added to vf_guc_min_load_for_hwconfig?
-Jonathan Cavitt


More information about the Intel-xe mailing list