[PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action

Michal Wajdeczko michal.wajdeczko at intel.com
Thu Feb 20 20:49:49 UTC 2025



On 20.02.2025 21:09, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Wajdeczko, Michal <Michal.Wajdeczko at intel.com> 
> Sent: Thursday, February 20, 2025 10:42 AM
> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-xe at lists.freedesktop.org
> Cc: Dan Carpenter <dan.carpenter at linaro.org>; Brost, Matthew <matthew.brost at intel.com>
> Subject: Re: [PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action
>>
>> On 19.02.2025 23:10, Cavitt, Jonathan wrote:
>>> -----Original Message-----
>>> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Michal Wajdeczko
>>> Sent: Wednesday, February 19, 2025 11:27 AM
>>> To: intel-xe at lists.freedesktop.org
>>> Cc: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>; Dan Carpenter <dan.carpenter at linaro.org>; Brost, Matthew <matthew.brost at intel.com>
>>> Subject: [PATCH] drm/xe/guc: Check response data of CLIENT_SOFT_RESET action
>>>>
>>>> Treat unexpected non-zero response data from GuC as -EPROTO error.
>>>> This will also prevent passing positive value to ERR_PTR().
>>>>
>>>> Reported-by: Dan Carpenter <dan.carpenter at linaro.org>
>>>> Closes: https://lore.kernel.org/intel-xe/5fd5a4b9-88ee-47f1-ac3e-32a20960c5b0@stanley.mountain/
>>>> Fixes: 12f95f9900c0 ("drm/xe/guc: Prefer GT oriented logs for GuC messages")
>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>>> Cc: Dan Carpenter <dan.carpenter at linaro.org>
>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>> ---
>>>>  drivers/gpu/drm/xe/xe_guc.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>>>> index 1619c0a52db9..bb03db361512 100644
>>>> --- a/drivers/gpu/drm/xe/xe_guc.c
>>>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>>>> @@ -1238,6 +1238,8 @@ int xe_guc_suspend(struct xe_guc *guc)
>>>>  
>>>>  	ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
>>>>  	if (ret) {
>>>> +		if (ret > 0)
>>>> +			ret = -EPROTO;
>>>
>>> I don't see anything wrong with this current implementation, though it's perhaps
>>> a bit unnecessary to nest this if statement in the "if (ret)" check.
>>
>> the idea for nested 'if' was to avoid extra check on normal flow, since
>> it will be very unlikely that return value will non-zero.
>>
>> with your approach below we will be always doing comparison twice
>>
>> but yeah, this is not on critical path, so who cares...
> 
> Those are fair points.  I think overall, this means the decision is a stylistic
> choice more than anything, so I'm still good with either approach.
> 
>>
>>> Maybe:
>>> """
>>> 	ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
>>> 	if (ret > 0)
>>> 		ret = -EPROTO;
>>> 	if (ret) {
>>> 	...
>>> """
>>> Or, if you don't mind ternary operations:
>>> """
>>> 	ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
>>> 	ret = ret > 0 ? -EPROTO : ret;
>>> 	if (ret) {
>>> 	...
>>> """
>>
>> actually above pattern would be useful if move out the sending of
>> SOFT_RESET to separate helper that handles unexpected replies in place
>> and always returns only 0 or -ve error code:
>>
>> int guc_send_soft_reset(guc)
>> {
>> 	u32 action[] = {..}
>> 	int ret;
>>
>> 	ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action));
>>  	return = ret > 0 ? -EPROTO : ret;
>> }
>>
>> and this is what I'm usually doing, see guc_action_vf_control_cmd()
> 
> Given that, currently, all but one uses of xe_guc_mmio_send have this check in
> place to some extent, we may want to generalize this function a bit if we take
> this route.  That way, we can suitably replace xe_guc_mmio_send in the locations
> where we don't expect a reply:
> """
> int xe_guc_mmio_send_noreply(struct xe_gut *guc, const u32 *request, u32 len)
> {
>         int ret = xe_guc_mmio_send(guc, request, len);
>         return ret > 0 ? -EPROTO : ret;
> }
> """

strictly speaking, due to a format of the HXG SUCCESS_RESPONSE message,
that is is sent as reply to REQUEST message, the DATA0 is always there,
so we can't say

	"we don't expect a reply"

as for such cases we have a separate FAST_REQUEST message, but it is not
applicable for the MMIO communication ;)

I can try to prepare series that will introduce

	xe_guc_mmio_send_mbz()
or
	xe_guc_mmio_send_recv_mbz()

> 
>>
>>>
>>> Alternatively, given we do something similar everywhere we call
>>> xe_guc_mmio_send (except for in xe_guc_hwconfig.c), we might
>>> want to consider modifying the function to perform this -EPROTO
>>> step automatically:
>>> """
>>> int xe_guc_mmio_send(struct xe_guc *guc, const u32 *request, u32 len)
>>> {
>>>         int ret = xe_guc_mmio_send_recv(guc, request, len, NULL);
>>>         return ret > 0 ? -EPROTO : ret;
>>> }
>>
>> no, as xe_guc_mmio_send() shall remain as generic purpose function since
>> GuC ABI does not prevent actions to return data just in DATA0 field.
>>
>> and IIRC the XE_GUC_ACTION_ALLOCATE_DOORBELL, which is unused in xe yet,
>>  is one example of such action.
> 
> I think if we are expecting DATA0 to have a non-zero value, we could still use
> xe_guc_mmio_send_recv to acquire it.  However, I won't contest the assertion
> that xe_guc_mmio_send needs to remain unmodified.

xe_guc_mmio_send_recv() is mostly for cases where you expect more than
DATA0 from HXG header, forcing callers to use this one is step back.

> 
>>
>>> ...
>>> +++ b/drivers/gpu/drm/xe/xe_guc_hwconfig.c
>>> static int send_get_hwconfig(struct xe_guc *guc, u64 ggtt_addr, u32 size)
>>> {
>>>         u32 action[] = {
>>>                 XE_GUC_ACTION_GET_HWCONFIG,
>>>                 lower_32_bits(ggtt_addr),
>>>                 upper_32_bits(ggtt_addr),
>>>                 size,
>>>         };
>>>         
>>>         return xe_guc_mmio_send_recv(guc, action, ARRAY_SIZE(action), NULL);
>>> }
>>> """
>>> These aren't necessary changes, by the way.  They're perhaps nice-to-haves,
>>> but I won't block on it:
>>> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> 
> My Reviewed-by still stands.

thanks!

> -Jonathan Cavitt
> 
>>> -Jonathan Cavitt
>>>
>>>>  		xe_gt_err(gt, "GuC suspend failed: %pe\n", ERR_PTR(ret));
>>>>  		return ret;
>>>>  	}
>>>> -- 
>>>> 2.47.1
>>>>
>>>>
>>
>>



More information about the Intel-xe mailing list