[PATCH 2/3] drm/xe/guc/ct: Increase wait timeout for g2h response

Matthew Brost matthew.brost at intel.com
Thu Oct 10 23:06:15 UTC 2024


On Wed, Oct 09, 2024 at 12:43:36PM -0700, John Harrison wrote:
> On 10/9/2024 03:56, Badal Nilawar wrote:
> > Occasionally, the G2H worker starts running after a delay of more than
> > a second even after being queued and activated by the Linux workqueue
> > subsystem.
> > To prevent G2H timeout errors, the wait timeout is being increased.
> > 
> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1620
> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2902
> > Signed-off-by: Badal Nilawar <badal.nilawar at intel.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > Cc: Matthew Auld <matthew.auld at intel.com>
> > Cc: John Harrison <John.C.Harrison at Intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_guc_ct.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > index b93b2821e4e8..dcc95c01b6f0 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > @@ -1019,7 +1019,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
> >   		return ret;
> >   	}
> > -	ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
> > +	ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ * 3);
> Is this change intended to be temporary until the fundamental scheduling
> issue with the workqueue is fixed? If so, there should be a TODO comment to
> that effect so that we remember to shrink the timeout back down again later.
> Three seconds seems like a long time to wait.
> 

I fine with this W/A until we root cause the work queue scheduling issue
but agree this needs a comment explaining why this large timeout is
needed (work queue scheduling issue), how to trigger the larger timeout
(tests which can trigger this), and saying once we root cause issue
reduce the timeout.

Matt

> John.
> 
> >   	/*
> >   	 * It is possible that the g2h request may be cancelled while waiting for a response due
> 


More information about the Intel-xe mailing list