[PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action

Cavitt, Jonathan jonathan.cavitt at intel.com
Thu Feb 20 22:03:16 UTC 2025


-----Original Message-----
From: Wajdeczko, Michal <Michal.Wajdeczko at intel.com> 
Sent: Thursday, February 20, 2025 12:50 PM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
Cc: Dan Carpenter <dan.carpenter at linaro.org>; Brost, Matthew <matthew.brost at intel.com>
Subject: Re: [PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action
> 
> On 20.02.2025 21:09, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Wajdeczko, Michal <Michal.Wajdeczko at intel.com> 
> > Sent: Thursday, February 20, 2025 10:42 AM
> > To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
> > Cc: Dan Carpenter <dan.carpenter at linaro.org>; Brost, Matthew <matthew.brost at intel.com>
> > Subject: Re: [PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action
> >>
> >> On 19.02.2025 23:10, Cavitt, Jonathan wrote:
> >>> -----Original Message-----
> >>> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Michal Wajdeczko
> >>> Sent: Wednesday, February 19, 2025 11:27 AM
> >>> To: intel-xe at lists.freedesktop.org
> >>> Cc: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>; Dan Carpenter <dan.carpenter at linaro.org>; Brost, Matthew <matthew.brost at intel.com>
> >>> Subject: [PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action
> >>>>
> >>>> Treat unexpected non-zero response data from GuC as -EPROTO error.
> >>>> This will also prevent passing positive value to ERR_PTR().
> >>>>
> >>>> Reported-by: Dan Carpenter <dan.carpenter at linaro.org>
> >>>> Closes: https://lore.kernel.org/intel-xe/5fd5a4b9-88ee-47f1-ac3e-32a20960c5b0@stanley.mountain/
> >>>> Fixes: 12f95f9900c0 ("drm/xe/guc: Prefer GT oriented logs for GuC messages")
> >>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> >>>> Cc: Dan Carpenter <dan.carpenter at linaro.org>
> >>>> Cc: Matthew Brost <matthew.brost at intel.com>
> >>>> ---
> >>>>  drivers/gpu/drm/xe/xe_guc.c | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> >>>> index 1619c0a52db9..bb03db361512 100644
> >>>> --- a/drivers/gpu/drm/xe/xe_guc.c
> >>>> +++ b/drivers/gpu/drm/xe/xe_guc.c
> >>>> @@ -1238,6 +1238,8 @@ int xe_guc_suspend(struct xe_guc *guc)
> >>>>  
> >>>>  	ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
> >>>>  	if (ret) {
> >>>> +		if (ret > 0)
> >>>> +			ret = -EPROTO;
> >>>
> >>> I don't see anything wrong with this current implementation, though it's perhaps
> >>> a bit unnecessary to nest this if statement in the "if (ret)" check.
> >>
> >> the idea for nested 'if' was to avoid extra check on normal flow, since
> >> it will be very unlikely that return value will non-zero.
> >>
> >> with your approach below we will be always doing comparison twice
> >>
> >> but yeah, this is not on critical path, so who cares...
> > 
> > Those are fair points.  I think overall, this means the decision is a stylistic
> > choice more than anything, so I'm still good with either approach.
> > 
> >>
> >>> Maybe:
> >>> """
> >>> 	ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
> >>> 	if (ret > 0)
> >>> 		ret = -EPROTO;
> >>> 	if (ret) {
> >>> 	...
> >>> """
> >>> Or, if you don't mind ternary operations:
> >>> """
> >>> 	ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
> >>> 	ret = ret > 0 ? -EPROTO : ret;
> >>> 	if (ret) {
> >>> 	...
> >>> """
> >>
> >> actually above pattern would be useful if move out the sending of
> >> SOFT_RESET to separate helper that handles unexpected replies in place
> >> and always returns only 0 or -ve error code:
> >>
> >> int guc_send_soft_reset(guc)
> >> {
> >> 	u32 action[] = {..}
> >> 	int ret;
> >>
> >> 	ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
> >>  	return = ret > 0 ? -EPROTO : ret;
> >> }
> >>
> >> and this is what I'm usually doing, see guc_action_vf_control_cmd()
> > 
> > Given that, currently, all but one uses of xe_guc_mmio_send have this check in
> > place to some extent, we may want to generalize this function a bit if we take
> > this route.  That way, we can suitably replace xe_guc_mmio_send in the locations
> > where we don't expect a reply:
> > """
> > int xe_guc_mmio_send_noreply(struct xe_gut *guc, const u32 *request, u32 len)
> > {
> >         int ret = xe_guc_mmio_send(guc, request, len);
> >         return ret > 0 ? -EPROTO : ret;
> > }
> > """
> 
> strictly speaking, due to a format of the HXG SUCCESS_RESPONSE message,
> that is is sent as reply to REQUEST message, the DATA0 is always there,
> so we can't say
> 
> 	"we don't expect a reply"
> 
> as for such cases we have a separate FAST_REQUEST message, but it is not
> applicable for the MMIO communication ;)
> 
> I can try to prepare series that will introduce
> 
> 	xe_guc_mmio_send_mbz()
> or
> 	xe_guc_mmio_send_recv_mbz()

I think xe_guc_mmio_send_mbz makes the most sense: we don't
currently appear to have any use cases where we want to mask the
return value of xe_guc_mmio_send_recv.

-Jonathan Cavitt

> 
> > 
> >>
> >>>
> >>> Alternatively, given we do something similar everywhere we call
> >>> xe_guc_mmio_send (except for in xe_guc_hwconfig.c), we might
> >>> want to consider modifying the function to perform this -EPROTO
> >>> step automatically:
> >>> """
> >>> int xe_guc_mmio_send(struct xe_guc *guc, const u32 *request, u32 len)
> >>> {
> >>>         int ret = xe_guc_mmio_send_recv(guc, request, len, NULL);
> >>>         return ret > 0 ? -EPROTO : ret;
> >>> }
> >>
> >> no, as xe_guc_mmio_send() shall remain as generic purpose function since
> >> GuC ABI does not prevent actions to return data just in DATA0 field.
> >>
> >> and IIRC the XE_GUC_ACTION_ALLOCATE_DOORBELL, which is unused in xe yet,
> >>  is one example of such action.
> > 
> > I think if we are expecting DATA0 to have a non-zero value, we could still use
> > xe_guc_mmio_send_recv to acquire it.  However, I won't contest the assertion
> > that xe_guc_mmio_send needs to remain unmodified.
> 
> xe_guc_mmio_send_recv() is mostly for cases where you expect more than
> DATA0 from HXG header, forcing callers to use this one is step back.
> 
> > 
> >>
> >>> ...
> >>> +++ b/drivers/gpu/drm/xe/xe_guc_hwconfig.c
> >>> static int send_get_hwconfig(struct xe_guc *guc, u64 ggtt_addr, u32 size)
> >>> {
> >>>         u32 action[] = {
> >>>                 XE_GUC_ACTION_GET_HWCONFIG,
> >>>                 lower_32_bits(ggtt_addr),
> >>>                 upper_32_bits(ggtt_addr),
> >>>                 size,
> >>>         };
> >>>         
> >>>         return xe_guc_mmio_send_recv(guc, action, ARRAY_SIZE(action), NULL);
> >>> }
> >>> """
> >>> These aren't necessary changes, by the way.  They're perhaps nice-to-haves,
> >>> but I won't block on it:
> >>> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > 
> > My Reviewed-by still stands.
> 
> thanks!
> 
> > -Jonathan Cavitt
> > 
> >>> -Jonathan Cavitt
> >>>
> >>>>  		xe_gt_err(gt, "GuC suspend failed: %pe\n", ERR_PTR(ret));
> >>>>  		return ret;
> >>>>  	}
> >>>> -- 
> >>>> 2.47.1
> >>>>
> >>>>
> >>
> >>
> 
> 


More information about the Intel-xe mailing list