[PATCH v3 2/4] drm/xe/guc: Ignore GuC CT errors when wedged

Summers, Stuart stuart.summers at intel.com
Wed Jun 4 00:09:16 UTC 2025


On Tue, 2025-06-03 at 11:42 -0700, Belgaumkar, Vinay wrote:
> 
> On 6/3/2025 10:34 AM, Summers, Stuart wrote:
> > On Mon, 2025-06-02 at 16:44 -0700, Vinay Belgaumkar wrote:
> > > Messaging to GuC may get canceled when device is wedged. Don't
> > > flag this as an error in xe_guc_pc code.
> > So if we're wedged already we are already in an error state right?
> > I
> > can understand flagging additional errors maybe gives a false
> > negative,
> > or rather would prompt us to look at the earlier errors to make
> > sure
> > these aren't just cascading, but do we really need to check for
> > this?
> 
> Yes, to avoid flase CI errors. This was actually for a CI failure
> seen 
> in the previous patch.

I feel like it would be nice to have a more generic interface here,
maybe even just xe_device_already_wedged() that checks what you have.
But the code you have does what it says and the sequence makes sense to
me.

Reviewed-by: Stuart Summers <stuart.summers at intel.com>

> 
> Thanks,
> 
> Vinay.
> 
> > 
> > Thanks,
> > Stuart
> > 
> > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> > > Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar at intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_guc_pc.c | 10 +++++-----
> > >   1 file changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c
> > > b/drivers/gpu/drm/xe/xe_guc_pc.c
> > > index cb0563494fcc..793df3486d1f 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_pc.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_pc.c
> > > @@ -154,7 +154,7 @@ static int pc_action_reset(struct xe_guc_pc
> > > *pc)
> > >          int ret;
> > >   
> > >          ret = xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0,
> > > 0);
> > > -       if (ret)
> > > +       if (ret && !(xe_device_wedged(pc_to_xe(pc)) && ret == -
> > > ECANCELED))
> > >                  xe_gt_err(pc_to_gt(pc), "GuC PC reset failed:
> > > %pe\n",
> > >                            ERR_PTR(ret));
> > >   
> > > @@ -178,7 +178,7 @@ static int pc_action_query_task_state(struct
> > > xe_guc_pc *pc)
> > >   
> > >          /* Blocking here to ensure the results are ready before
> > > reading them */
> > >          ret = xe_guc_ct_send_block(ct, action,
> > > ARRAY_SIZE(action));
> > > -       if (ret)
> > > +       if (ret && !(xe_device_wedged(pc_to_xe(pc)) && ret == -
> > > ECANCELED))
> > >                  xe_gt_err(pc_to_gt(pc), "GuC PC query task state
> > > failed: %pe\n",
> > >                            ERR_PTR(ret));
> > >   
> > > @@ -201,7 +201,7 @@ static int pc_action_set_param(struct
> > > xe_guc_pc
> > > *pc, u8 id, u32 value)
> > >                  return -EAGAIN;
> > >   
> > >          ret = xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0,
> > > 0);
> > > -       if (ret)
> > > +       if (ret && !(xe_device_wedged(pc_to_xe(pc)) && ret == -
> > > ECANCELED))
> > >                  xe_gt_err(pc_to_gt(pc), "GuC PC set param[%u]=%u
> > > failed: %pe\n",
> > >                            id, value, ERR_PTR(ret));
> > >   
> > > @@ -223,7 +223,7 @@ static int pc_action_unset_param(struct
> > > xe_guc_pc
> > > *pc, u8 id)
> > >                  return -EAGAIN;
> > >   
> > >          ret = xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0,
> > > 0);
> > > -       if (ret)
> > > +       if (ret && !(xe_device_wedged(pc_to_xe(pc)) && ret == -
> > > ECANCELED))
> > >                  xe_gt_err(pc_to_gt(pc), "GuC PC unset param
> > > failed:
> > > %pe",
> > >                            ERR_PTR(ret));
> > >   
> > > @@ -240,7 +240,7 @@ static int pc_action_setup_gucrc(struct
> > > xe_guc_pc
> > > *pc, u32 mode)
> > >          int ret;
> > >   
> > >          ret = xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0,
> > > 0);
> > > -       if (ret)
> > > +       if (ret && !(xe_device_wedged(pc_to_xe(pc)) && ret == -
> > > ECANCELED))
> > >                  xe_gt_err(pc_to_gt(pc), "GuC RC enable mode=%u
> > > failed: %pe\n",
> > >                            mode, ERR_PTR(ret));
> > >          return ret;



More information about the Intel-xe mailing list