[Intel-xe] [PATCH 3/4] drm/xe/guc: Fix handling of GUC_HXG_TYPE_NO_RESPONSE_BUSY

Wed Nov 15 14:12:32 UTC 2023

On Wed, Nov 15, 2023 at 07:00:45PM +0100, Michal Wajdeczko wrote:
> 
> 
> On 15.11.2023 11:41, Matthew Brost wrote:
> > On Wed, Nov 15, 2023 at 12:37:47PM +0100, Michal Wajdeczko wrote:
> >> If GuC responds with the NO_RESPONSE_BUSY message, we extend
> >> our timeout while waiting for the actual response, but we wrongly
> >> assumed that the next message will be RESPONSE_SUCCESS, missing
> >> that we still can get RESPONSE_FAILURE.
> >>
> >> Change the condition for the expected message type, using only
> >> common bits from RESPONSE_SUCCESS and RESPONSE_FAILURE (as they
> >> differ, by ABI design, only by the last bit).
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> >> ---
> >>  drivers/gpu/drm/xe/xe_guc.c | 5 +++--
> >>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> >> index ddbe64eb0f3b..16c233f120c6 100644
> >> --- a/drivers/gpu/drm/xe/xe_guc.c
> >> +++ b/drivers/gpu/drm/xe/xe_guc.c
> >> @@ -671,9 +671,10 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
> >>  	header = xe_mmio_read32(gt, reply_reg);
> >>  	if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) ==
> >>  	    GUC_HXG_TYPE_NO_RESPONSE_BUSY) {
> >> +		u32 resp_bits = GUC_HXG_TYPE_RESPONSE_SUCCESS & GUC_HXG_TYPE_RESPONSE_FAILURE;
> > 
> > I think this works but is kinda goofy. It only works because 0x7 & 0x6
> > are values. If these are defined as 0x6 & 0x1 this doesn't work.
> 
> but values for SUCCESS/FAILURE (as said in the commit message) were

Didn't read the commit see that now.

> designed to work that way and the HXG spec is stable AFAIK
>

Can you make this a new define then? I had to check the values of
GUC_HXG_TYPE_RESPONSE_SUCCESS & GUC_HXG_TYPE_RESPONSE_FAILURE to
understand why this works. Or at minimum add a comment as just looking
at code it doesn't appear to work when indeed it does.

> > 
> >> +		u32 resp_mask = FIELD_PREP(GUC_HXG_MSG_0_TYPE, resp_bits);
> >>  
> >> -		ret = xe_mmio_wait32(gt, reply_reg, GUC_HXG_MSG_0_TYPE,
> >> -				     FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_RESPONSE_SUCCESS),
> >> +		ret = xe_mmio_wait32(gt, reply_reg,  resp_mask, resp_mask,
> >>  				     1000000, &header, false);
> >>
> > 
> > Here now we'd need to check for a failure in the header field.
> 
> checks for FAILURE/SUCCESS are done in "normal" path below, no need for
> extra checks here
> 

Ah, yes. See that now.

> > 
> > With all of this, I believe it is better just to wait for
> > GUC_HXG_TYPE_RESPONSE_SUCCESS and have xe_mmio_wait32 timeout on all
> > other returns. Getting a failure case is rare and we can live a timeout.
> 
> but are writing production quality driver, no?
> 
> and even for the "rare" cases there could be a need for different
> handling of timeout vs error response (as example take version handshake
> between VF and GuC - timeout means game over, while error might indicate
> to try with different ABI level)
> 

Yea ok, makes a bit more sense now.

Matt

> > 
> > Matt
> > 
> >>  		if (unlikely(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, header) !=
> >> -- 
> >> 2.25.1
> >>