[PATCH 2/3] drm/xe/guc/ct: Increase wait timeout for g2h response

Nilawar, Badal badal.nilawar at intel.com
Mon Oct 14 12:12:29 UTC 2024


Hi Matt, John,

Thanks for review comments.

On 11-10-2024 04:36, Matthew Brost wrote:
> On Wed, Oct 09, 2024 at 12:43:36PM -0700, John Harrison wrote:
>> On 10/9/2024 03:56, Badal Nilawar wrote:
>>> Occasionally, the G2H worker starts running after a delay of more than
>>> a second even after being queued and activated by the Linux workqueue
>>> subsystem.
>>> To prevent G2H timeout errors, the wait timeout is being increased.
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1620
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2902
>>> Signed-off-by: Badal Nilawar <badal.nilawar at intel.com>
>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>> Cc: Matthew Auld <matthew.auld at intel.com>
>>> Cc: John Harrison <John.C.Harrison at Intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_guc_ct.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>>> index b93b2821e4e8..dcc95c01b6f0 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>>> @@ -1019,7 +1019,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>>>    		return ret;
>>>    	}
>>> -	ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
>>> +	ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ * 3);
>> Is this change intended to be temporary until the fundamental scheduling
>> issue with the workqueue is fixed? If so, there should be a TODO comment to
>> that effect so that we remember to shrink the timeout back down again later.
>> Three seconds seems like a long time to wait.
>>
> 
> I fine with this W/A until we root cause the work queue scheduling issue
> but agree this needs a comment explaining why this large timeout is
> needed (work queue scheduling issue), how to trigger the larger timeout
> (tests which can trigger this), and saying once we root cause issue
> reduce the timeout.

Sure, I will add the comment here and in patch 3 to explain why this is 
needed and change need to be reverted once this is fixed.

Regards,
Badal

> 
> Matt
> 
>> John.
>>
>>>    	/*
>>>    	 * It is possible that the g2h request may be cancelled while waiting for a response due
>>



More information about the Intel-xe mailing list