[PATCH v2] drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout

Fri Oct 25 16:06:47 UTC 2024

On 10/24/2024 7:22 PM, Matthew Brost wrote:
> On Thu, Oct 24, 2024 at 10:14:21AM -0700, John Harrison wrote:
>> On 10/24/2024 08:18, Nirmoy Das wrote:
>>> Flush xe ordered_wq in case of ufence timeout which is observed
>>> on LNL and that points to the recent scheduling issue with E-cores.
>>>
>>> This is similar to the recent fix:
>>> commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h
>>> response timeout") and should be removed once there is E core
>>> scheduling fix.
>>>
>>> v2: Add platform check(Himal)
>>>      s/__flush_workqueue/flush_workqueue(Jani)
>>>
>>> Cc: Badal Nilawar <badal.nilawar at intel.com>
>>> Cc: Jani Nikula <jani.nikula at intel.com>
>>> Cc: Matthew Auld <matthew.auld at intel.com>
>>> Cc: John Harrison <John.C.Harrison at Intel.com>
>>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>> Cc: <stable at vger.kernel.org> # v6.11+
>>> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2754
>>> Suggested-by: Matthew Brost <matthew.brost at intel.com>
>>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>>> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_wait_user_fence.c | 14 ++++++++++++++
>>>   1 file changed, 14 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_wait_user_fence.c b/drivers/gpu/drm/xe/xe_wait_user_fence.c
>>> index f5deb81eba01..78a0ad3c78fe 100644
>>> --- a/drivers/gpu/drm/xe/xe_wait_user_fence.c
>>> +++ b/drivers/gpu/drm/xe/xe_wait_user_fence.c
>>> @@ -13,6 +13,7 @@
>>>   #include "xe_device.h"
>>>   #include "xe_gt.h"
>>>   #include "xe_macros.h"
>>> +#include "compat-i915-headers/i915_drv.h"
>>>   #include "xe_exec_queue.h"
>>>   static int do_compare(u64 addr, u64 value, u64 mask, u16 op)
>>> @@ -155,6 +156,19 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data,
>>>   		}
>>>   		if (!timeout) {
>>> +			if (IS_LUNARLAKE(xe)) {
>>> +				/*
>>> +				 * This is analogous to e51527233804 ("drm/xe/guc/ct: Flush g2h
>>> +				 * worker in case of g2h response timeout")
>>> +				 *
>>> +				 * TODO: Drop this change once workqueue scheduling delay issue is
>>> +				 * fixed on LNL Hybrid CPU.
>>> +				 */
>>> +				flush_workqueue(xe->ordered_wq);
>> If we are having multiple instances of this workaround, can we wrap them up
>> in as 'LNL_FLUSH_WORKQUEUE(q)' or some such? Put the IS_LNL check inside the
>> macro and make it pretty obvious exactly where all the instances are by
>> having a single macro name to search for.
>>
> +1, I think Lucas is suggesting something similar to this on the chat to
> make sure we don't lose track of removing these W/A when this gets
> fixed.
>
> Matt

Sounds good. I will add LNL_FLUSH_WORKQUEUE() and use that for all the places we need this WA.

Regards,

Nirmoy

>
>> John.
>>
>>> +				err = do_compare(addr, args->value, args->mask, args->op);
>>> +				if (err <= 0)
>>> +					break;
>>> +			}
>>>   			err = -ETIME;
>>>   			break;
>>>   		}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-xe/attachments/20241025/4d29066c/attachment-0001.htm>