[PATCH v5 1/3] drm/xe: Move LNL scheduling WA to xe_device.h
John Harrison
john.c.harrison at intel.com
Fri Nov 1 18:48:06 UTC 2024
On 10/29/2024 05:01, Nirmoy Das wrote:
> Move LNL scheduling WA to xe_device.h so this can be used in other
> places without needing keep the same comment about removal of this WA
> in the future. The WA, which flushes work or workqueues, is now wrapped
> in macros and can be reused wherever needed.
>
> Cc: Badal Nilawar <badal.nilawar at intel.com>
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
> cc: <stable at vger.kernel.org> # v6.11+
> Suggested-by: John Harrison <John.C.Harrison at Intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
> ---
> drivers/gpu/drm/xe/xe_device.h | 14 ++++++++++++++
> drivers/gpu/drm/xe/xe_guc_ct.c | 11 +----------
> 2 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 4c3f0ebe78a9..f1fbfe916867 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -191,4 +191,18 @@ void xe_device_declare_wedged(struct xe_device *xe);
> struct xe_file *xe_file_get(struct xe_file *xef);
> void xe_file_put(struct xe_file *xef);
>
> +/*
> + * Occasionally it is seen that the G2H worker starts running after a delay of more than
> + * a second even after being queued and activated by the Linux workqueue subsystem. This
> + * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
> + * Lunarlake Hybrid CPU. Issue disappears if we disable Lunarlake atom cores from BIOS
> + * and this is beyond xe kmd.
> + *
> + * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU.
> + */
> +#define LNL_FLUSH_WORKQUEUE(wq__) \
> + flush_workqueue(wq__)
> +#define LNL_FLUSH_WORK(wrk__) \
> + flush_work(wrk__)
> +
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 1b5d8fb1033a..703b44b257a7 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -1018,17 +1018,8 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>
> ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
>
> - /*
> - * Occasionally it is seen that the G2H worker starts running after a delay of more than
> - * a second even after being queued and activated by the Linux workqueue subsystem. This
> - * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
> - * Lunarlake Hybrid CPU. Issue dissappears if we disable Lunarlake atom cores from BIOS
> - * and this is beyond xe kmd.
> - *
> - * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU.
> - */
> if (!ret) {
> - flush_work(&ct->g2h_worker);
> + LNL_FLUSH_WORK(&ct->g2h_worker);
> if (g2h_fence.done) {
> xe_gt_warn(gt, "G2H fence %u, action %04x, done\n",
> g2h_fence.seqno, action[0]);
This message is still wrong.
We have a warning that says 'job completed successfully'! That is
misleading. It needs to say "done after flush" or "done but flush was
required" or something along those lines.
John.
More information about the Intel-xe
mailing list