[Intel-xe] [PATCH 3/4] drm/xe/guc: Fix handling of GUC_HXG_TYPE_NO_RESPONSE_BUSY

Michal Wajdeczko michal.wajdeczko at intel.com
Wed Nov 15 18:00:45 UTC 2023



On 15.11.2023 11:41, Matthew Brost wrote:
> On Wed, Nov 15, 2023 at 12:37:47PM +0100, Michal Wajdeczko wrote:
>> If GuC responds with the NO_RESPONSE_BUSY message, we extend
>> our timeout while waiting for the actual response, but we wrongly
>> assumed that the next message will be RESPONSE_SUCCESS, missing
>> that we still can get RESPONSE_FAILURE.
>>
>> Change the condition for the expected message type, using only
>> common bits from RESPONSE_SUCCESS and RESPONSE_FAILURE (as they
>> differ, by ABI design, only by the last bit).
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_guc.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>> index ddbe64eb0f3b..16c233f120c6 100644
>> --- a/drivers/gpu/drm/xe/xe_guc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>> @@ -671,9 +671,10 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
>>  	header = xe_mmio_read32(gt, reply_reg);
>>  	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) ==
>>  	    GUC_HXG_TYPE_NO_RESPONSE_BUSY) {
>> +		u32 resp_bits = GUC_HXG_TYPE_RESPONSE_SUCCESS & GUC_HXG_TYPE_RESPONSE_FAILURE;
> 
> I think this works but is kinda goofy. It only works because 0x7 & 0x6
> are values. If these are defined as 0x6 & 0x1 this doesn't work.

but values for SUCCESS/FAILURE (as said in the commit message) were
designed to work that way and the HXG spec is stable AFAIK

> 
>> +		u32 resp_mask = FIELD_PREP(GUC_HXG_MSG_0_TYPE, resp_bits);
>>  
>> -		ret = xe_mmio_wait32(gt, reply_reg, GUC_HXG_MSG_0_TYPE,
>> -				     FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS),
>> +		ret = xe_mmio_wait32(gt, reply_reg,  resp_mask, resp_mask,
>>  				     1000000, &header, false);
>>
> 
> Here now we'd need to check for a failure in the header field.

checks for FAILURE/SUCCESS are done in "normal" path below, no need for
extra checks here

> 
> With all of this, I believe it is better just to wait for
> GUC_HXG_TYPE_RESPONSE_SUCCESS and have xe_mmio_wait32 timeout on all
> other returns. Getting a failure case is rare and we can live a timeout.

but are writing production quality driver, no?

and even for the "rare" cases there could be a need for different
handling of timeout vs error response (as example take version handshake
between VF and GuC - timeout means game over, while error might indicate
to try with different ABI level)

> 
> Matt
> 
>>  		if (unlikely(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, header) !=
>> -- 
>> 2.25.1
>>


More information about the Intel-xe mailing list