[PATCH 2/3] drm/xe/guc/ct: Increase wait timeout for g2h response
Nilawar, Badal
badal.nilawar at intel.com
Mon Oct 14 12:12:29 UTC 2024
Hi Matt, John,
Thanks for review comments.
On 11-10-2024 04:36, Matthew Brost wrote:
> On Wed, Oct 09, 2024 at 12:43:36PM -0700, John Harrison wrote:
>> On 10/9/2024 03:56, Badal Nilawar wrote:
>>> Occasionally, the G2H worker starts running after a delay of more than
>>> a second even after being queued and activated by the Linux workqueue
>>> subsystem.
>>> To prevent G2H timeout errors, the wait timeout is being increased.
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1620
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2902
>>> Signed-off-by: Badal Nilawar <badal.nilawar at intel.com>
>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>> Cc: Matthew Auld <matthew.auld at intel.com>
>>> Cc: John Harrison <John.C.Harrison at Intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_guc_ct.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>>> index b93b2821e4e8..dcc95c01b6f0 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>>> @@ -1019,7 +1019,7 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>>> return ret;
>>> }
>>> - ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
>>> + ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ * 3);
>> Is this change intended to be temporary until the fundamental scheduling
>> issue with the workqueue is fixed? If so, there should be a TODO comment to
>> that effect so that we remember to shrink the timeout back down again later.
>> Three seconds seems like a long time to wait.
>>
>
> I fine with this W/A until we root cause the work queue scheduling issue
> but agree this needs a comment explaining why this large timeout is
> needed (work queue scheduling issue), how to trigger the larger timeout
> (tests which can trigger this), and saying once we root cause issue
> reduce the timeout.
Sure, I will add the comment here and in patch 3 to explain why this is
needed and change need to be reverted once this is fixed.
Regards,
Badal
>
> Matt
>
>> John.
>>
>>> /*
>>> * It is possible that the g2h request may be cancelled while waiting for a response due
>>
More information about the Intel-xe
mailing list