[PATCH v2 4/4] drm/xe: Wait on killed exec queues
Cavitt, Jonathan
jonathan.cavitt at intel.com
Tue Oct 29 22:05:18 UTC 2024
-----Original Message-----
From: De Marchi, Lucas <lucas.demarchi at intel.com>
Sent: Tuesday, October 29, 2024 2:44 PM
To: intel-xe at lists.freedesktop.org
Cc: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>; Brost, Matthew <matthew.brost at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>
Subject: [PATCH v2 4/4] drm/xe: Wait on killed exec queues
>
> When an exec queue is killed it triggers an async process of asking the
> GuC to schedule the context out. The timestamp in the context image is
> only updated when this process completes. In case a userspace process
> kills an exec and tries to read the timestamp, it may not get an updated
> runtime.
>
> Add synchronization between the process reading the fdinfo and the exec
> queue being killed. After reading all the timestamps, wait on exec
> queues in the process of being killed. When that wait is over,
> xe_exec_queue_fini() was already called and updated the timestamps.
>
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
Non-blocking asides below. Otherwise:
Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> ---
> drivers/gpu/drm/xe/xe_device_types.h | 5 +++++
> drivers/gpu/drm/xe/xe_drm_client.c | 7 +++++++
> drivers/gpu/drm/xe/xe_exec_queue.c | 4 ++++
> 3 files changed, 16 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index ef7412d653d2e..b949376ca388a 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -614,6 +614,11 @@ struct xe_file {
> * which does things while being held.
> */
> struct mutex lock;
> + /**
> + * @exec_queue.pending_removal: items pending to be removed to
> + * synchronize GPU state update with ongoing query.
> + */
> + atomic_t pending_removal;
> } exec_queue;
>
> /** @run_ticks: hw engine class run time in ticks for this drm client */
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> index 22f0f1a6dfd55..24a0a7377abf2 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -317,6 +317,13 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file)
> break;
> }
>
> + /*
> + * Wait for any exec queue going away: their cycles will get updated on
> + * context switch out, so wait for that to happen
> + */
> + wait_var_event(&xef->exec_queue.pending_removal,
> + !atomic_read(&xef->exec_queue.pending_removal));
There's probably an argument to be made that this should be done before
getting the total GPU cycles (as that's when we update the run_ticks value
of the xef exec queues), but I'm not going to block on that.
> +
> xe_pm_runtime_put(xe);
>
> if (unlikely(!hwe))
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index fd0f3b3c9101d..58dd35beb15ad 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -262,8 +262,11 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
>
> /*
> * Before releasing our ref to lrc and xef, accumulate our run ticks
> + * and wakeup any waiters.
> */
> xe_exec_queue_update_run_ticks(q);
> + if (q->xef && atomic_dec_and_test(&q->xef->exec_queue.pending_removal))
I was going to say that we should probably perform a wake_up_var irrespective
of the dec_and_test result, but then I realized that there isn't much reason to
wake up the waiter if the pending_removal value isn't zero, so I suppose that's
a moot point.
-Jonathan Cavitt
> + wake_up_var(&q->xef->exec_queue.pending_removal);
>
> for (i = 0; i < q->width; ++i)
> xe_lrc_put(q->lrc[i]);
> @@ -824,6 +827,7 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
> XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> return -EINVAL;
>
> + atomic_inc(&xef->exec_queue.pending_removal);
> mutex_lock(&xef->exec_queue.lock);
> q = xa_erase(&xef->exec_queue.xa, args->exec_queue_id);
> mutex_unlock(&xef->exec_queue.lock);
> --
> 2.47.0
>
>
More information about the Intel-xe
mailing list