[PATCH 2/3] drm/xe: Accumulate exec queue timestamp on destroy

Umesh Nerlige Ramappa umesh.nerlige.ramappa at intel.com
Mon Oct 28 20:33:09 UTC 2024


On Sat, Oct 26, 2024 at 12:08:47PM -0500, Lucas De Marchi wrote:
>When the exec queue is destroyed, there's a race between a query to the
>fdinfo and the exec queue value being updated: after the destroy ioctl,
>if the fdinfo is queried before a call to guc_exec_queue_free_job(),
>the wrong utilization is reported: it's not accumulated on the query
>since the queue was removed from the array, and the value wasn't updated
>yet by the free_job().
>
>Explicitly accumulate the engine utilization so the right value is
>visible after the ioctl return.
>
>Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667
>Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
>Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>---
> drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>index d098d2dd1b2d..b15ca84b2422 100644
>--- a/drivers/gpu/drm/xe/xe_exec_queue.c
>+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>@@ -829,6 +829,14 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>
> 	xe_exec_queue_kill(q);
>
>+	/*
>+	 * After killing and destroying the exec queue, make sure userspace has
>+	 * an updated view of the run ticks, regardless if this was the last
>+	 * ref: since the exec queue is removed from xef->exec_queue.xa, a
>+	 * query to fdinfo after this returns could not account for this load.
>+	 */
>+	xe_exec_queue_update_run_ticks(q);
>+

At this point we may/may-not have the updated LRC timestamp.

fwiu, xe_exec_queue_kill() is an async call. It will queue a work that 
will disable guc scheduling on the context. Once guc notifies KMD that 
scheduling is disabled on this context, KMD knows for sure that the 
context has switched out and the lrc timestamp is updated for this 
context. It may work well for contexts that switch frequently and may 
not work for contexts that seldom switch or never destroy their exec 
queue.

I still believe calling it from job free is the right thing to do. As 
for the ~120 Hz updates, these are just memory updates, so not sure if 
it's a huge performance impact.

If the ftrace is getting filed up, we could throttle that.

Thanks,
Umesh

> 	trace_xe_exec_queue_close(q);
> 	xe_exec_queue_put(q);
>
>-- 
>2.47.0
>


More information about the Intel-xe mailing list