[Intel-xe] [PATCH 3/4] drm/xe/guc: Fix handling of GUC_HXG_TYPE_NO_RESPONSE_BUSY
Michal Wajdeczko
michal.wajdeczko at intel.com
Wed Nov 15 18:00:45 UTC 2023
On 15.11.2023 11:41, Matthew Brost wrote:
> On Wed, Nov 15, 2023 at 12:37:47PM +0100, Michal Wajdeczko wrote:
>> If GuC responds with the NO_RESPONSE_BUSY message, we extend
>> our timeout while waiting for the actual response, but we wrongly
>> assumed that the next message will be RESPONSE_SUCCESS, missing
>> that we still can get RESPONSE_FAILURE.
>>
>> Change the condition for the expected message type, using only
>> common bits from RESPONSE_SUCCESS and RESPONSE_FAILURE (as they
>> differ, by ABI design, only by the last bit).
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_guc.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>> index ddbe64eb0f3b..16c233f120c6 100644
>> --- a/drivers/gpu/drm/xe/xe_guc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>> @@ -671,9 +671,10 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
>> header = xe_mmio_read32(gt, reply_reg);
>> if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) ==
>> GUC_HXG_TYPE_NO_RESPONSE_BUSY) {
>> + u32 resp_bits = GUC_HXG_TYPE_RESPONSE_SUCCESS & GUC_HXG_TYPE_RESPONSE_FAILURE;
>
> I think this works but is kinda goofy. It only works because 0x7 & 0x6
> are values. If these are defined as 0x6 & 0x1 this doesn't work.
but values for SUCCESS/FAILURE (as said in the commit message) were
designed to work that way and the HXG spec is stable AFAIK
>
>> + u32 resp_mask = FIELD_PREP(GUC_HXG_MSG_0_TYPE, resp_bits);
>>
>> - ret = xe_mmio_wait32(gt, reply_reg, GUC_HXG_MSG_0_TYPE,
>> - FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS),
>> + ret = xe_mmio_wait32(gt, reply_reg, resp_mask, resp_mask,
>> 1000000, &header, false);
>>
>
> Here now we'd need to check for a failure in the header field.
checks for FAILURE/SUCCESS are done in "normal" path below, no need for
extra checks here
>
> With all of this, I believe it is better just to wait for
> GUC_HXG_TYPE_RESPONSE_SUCCESS and have xe_mmio_wait32 timeout on all
> other returns. Getting a failure case is rare and we can live a timeout.
but are writing production quality driver, no?
and even for the "rare" cases there could be a need for different
handling of timeout vs error response (as example take version handshake
between VF and GuC - timeout means game over, while error might indicate
to try with different ABI level)
>
> Matt
>
>> if (unlikely(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, header) !=
>> --
>> 2.25.1
>>
More information about the Intel-xe
mailing list