[PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action
Cavitt, Jonathan
jonathan.cavitt at intel.com
Thu Feb 20 20:09:23 UTC 2025
-----Original Message-----
From: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>
Sent: Thursday, February 20, 2025 10:42 AM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
Cc: Dan Carpenter <dan.carpenter at linaro.org>; Brost, Matthew <matthew.brost at intel.com>
Subject: Re: [PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action
>
> On 19.02.2025 23:10, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Michal Wajdeczko
> > Sent: Wednesday, February 19, 2025 11:27 AM
> > To: intel-xe at lists.freedesktop.org
> > Cc: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>; Dan Carpenter <dan.carpenter at linaro.org>; Brost, Matthew <matthew.brost at intel.com>
> > Subject: [PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action
> >>
> >> Treat unexpected non-zero response data from GuC as -EPROTO error.
> >> This will also prevent passing positive value to ERR_PTR().
> >>
> >> Reported-by: Dan Carpenter <dan.carpenter at linaro.org>
> >> Closes: https://lore.kernel.org/intel-xe/5fd5a4b9-88ee-47f1-ac3e-32a20960c5b0@stanley.mountain/
> >> Fixes: 12f95f9900c0 ("drm/xe/guc: Prefer GT oriented logs for GuC messages")
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> >> Cc: Dan Carpenter <dan.carpenter at linaro.org>
> >> Cc: Matthew Brost <matthew.brost at intel.com>
> >> ---
> >> drivers/gpu/drm/xe/xe_guc.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> >> index 1619c0a52db9..bb03db361512 100644
> >> --- a/drivers/gpu/drm/xe/xe_guc.c
> >> +++ b/drivers/gpu/drm/xe/xe_guc.c
> >> @@ -1238,6 +1238,8 @@ int xe_guc_suspend(struct xe_guc *guc)
> >>
> >> ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
> >> if (ret) {
> >> + if (ret > 0)
> >> + ret = -EPROTO;
> >
> > I don't see anything wrong with this current implementation, though it's perhaps
> > a bit unnecessary to nest this if statement in the "if (ret)" check.
>
> the idea for nested 'if' was to avoid extra check on normal flow, since
> it will be very unlikely that return value will non-zero.
>
> with your approach below we will be always doing comparison twice
>
> but yeah, this is not on critical path, so who cares...
Those are fair points. I think overall, this means the decision is a stylistic
choice more than anything, so I'm still good with either approach.
>
> > Maybe:
> > """
> > ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
> > if (ret > 0)
> > ret = -EPROTO;
> > if (ret) {
> > ...
> > """
> > Or, if you don't mind ternary operations:
> > """
> > ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
> > ret = ret > 0 ? -EPROTO : ret;
> > if (ret) {
> > ...
> > """
>
> actually above pattern would be useful if move out the sending of
> SOFT_RESET to separate helper that handles unexpected replies in place
> and always returns only 0 or -ve error code:
>
> int guc_send_soft_reset(guc)
> {
> u32 action[] = {..}
> int ret;
>
> ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
> return = ret > 0 ? -EPROTO : ret;
> }
>
> and this is what I'm usually doing, see guc_action_vf_control_cmd()
Given that, currently, all but one uses of xe_guc_mmio_send have this check in
place to some extent, we may want to generalize this function a bit if we take
this route. That way, we can suitably replace xe_guc_mmio_send in the locations
where we don't expect a reply:
"""
int xe_guc_mmio_send_noreply(struct xe_gut *guc, const u32 *request, u32 len)
{
int ret = xe_guc_mmio_send(guc, request, len);
return ret > 0 ? -EPROTO : ret;
}
"""
>
> >
> > Alternatively, given we do something similar everywhere we call
> > xe_guc_mmio_send (except for in xe_guc_hwconfig.c), we might
> > want to consider modifying the function to perform this -EPROTO
> > step automatically:
> > """
> > int xe_guc_mmio_send(struct xe_guc *guc, const u32 *request, u32 len)
> > {
> > int ret = xe_guc_mmio_send_recv(guc, request, len, NULL);
> > return ret > 0 ? -EPROTO : ret;
> > }
>
> no, as xe_guc_mmio_send() shall remain as generic purpose function since
> GuC ABI does not prevent actions to return data just in DATA0 field.
>
> and IIRC the XE_GUC_ACTION_ALLOCATE_DOORBELL, which is unused in xe yet,
> is one example of such action.
I think if we are expecting DATA0 to have a non-zero value, we could still use
xe_guc_mmio_send_recv to acquire it. However, I won't contest the assertion
that xe_guc_mmio_send needs to remain unmodified.
>
> > ...
> > +++ b/drivers/gpu/drm/xe/xe_guc_hwconfig.c
> > static int send_get_hwconfig(struct xe_guc *guc, u64 ggtt_addr, u32 size)
> > {
> > u32 action[] = {
> > XE_GUC_ACTION_GET_HWCONFIG,
> > lower_32_bits(ggtt_addr),
> > upper_32_bits(ggtt_addr),
> > size,
> > };
> >
> > return xe_guc_mmio_send_recv(guc, action, ARRAY_SIZE(action), NULL);
> > }
> > """
> > These aren't necessary changes, by the way. They're perhaps nice-to-haves,
> > but I won't block on it:
> > Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
My Reviewed-by still stands.
-Jonathan Cavitt
> > -Jonathan Cavitt
> >
> >> xe_gt_err(gt, "GuC suspend failed: %pe\n", ERR_PTR(ret));
> >> return ret;
> >> }
> >> --
> >> 2.47.1
> >>
> >>
>
>
More information about the Intel-xe
mailing list