[PATCH 3/3] drm/xe: Stop accumulating LRC timestamp on job_free
Cavitt, Jonathan
jonathan.cavitt at intel.com
Mon Oct 28 20:29:29 UTC 2024
-----Original Message-----
From: De Marchi, Lucas <lucas.demarchi at intel.com>
Sent: Saturday, October 26, 2024 10:09 AM
To: intel-xe at lists.freedesktop.org
Cc: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>
Subject: [PATCH 3/3] drm/xe: Stop accumulating LRC timestamp on job_free
>
> The exec queue timestamp is only really useful when it's being queried
> through the fdinfo. There's no need to update it so often, on every
> job_free. Tracing a simple app like vkcube running shows an update
> rate of ~ 120Hz.
>
> The update on job_free() is used to cover a gap: if exec
> queue is created and destroyed rapidily, before a new query, the
> timestamp still needs to be accumulated and accounted on the xef.
> Initial implementation in commit 6109f24f87d7 ("drm/xe: Add helper to
> accumulate exec queue runtime") couldn't do it on the exec_queue_fini
> since the xef could be gone at that point. However since commit
> ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") the xef is
> refcounted and the exec queue has a reference.
>
> Improve the fix in commit 2149ded63079 ("drm/xe: Fix use after free when
> client stats are captured") by reducing the frequency in which the
> update is needed.
>
> Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured")
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
> drivers/gpu/drm/xe/xe_exec_queue.c | 6 ++++++
> drivers/gpu/drm/xe/xe_guc_submit.c | 2 --
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index b15ca84b2422..bc2fc917e0de 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -260,8 +260,14 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
> {
> int i;
>
> + /*
> + * Before releasing our ref to lrc and xef, accumulate our run ticks
> + */
> + xe_exec_queue_update_run_ticks(q);
I mean, if it works, it works. However,
1) I might be mistaken, but if I'm understanding correctly, xe_exec_queue_fini
is just as asynchronous as guc_exec_queue_free_job was, meaning we're fairly
liable to hit the same issues as before.
2) If this is designed to cover an fd close use case (as per a discussion we had),
shouldn't we be accumulating the usage in the code segment that performs
the fd close? I don't know where that is, but I suspect it might be xe_file_close
or xe_file_destroy.
I won't block on this, because perhaps I don't have the full picture.
Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
-Jonathan Cavitt
> +
> for (i = 0; i < q->width; ++i)
> xe_lrc_put(q->lrc[i]);
> +
> __xe_exec_queue_free(q);
> }
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index e5d7c767a744..ebe4665d9159 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -747,8 +747,6 @@ static void guc_exec_queue_free_job(struct drm_sched_job *drm_job)
> {
> struct xe_sched_job *job = to_xe_sched_job(drm_job);
>
> - xe_exec_queue_update_run_ticks(job->q);
> -
> trace_xe_sched_job_free(job);
> xe_sched_job_put(job);
> }
> --
> 2.47.0
>
>
More information about the Intel-xe
mailing list