[PATCH 2/3] drm/xe: Accumulate exec queue timestamp on destroy
Matthew Brost
matthew.brost at intel.com
Mon Oct 28 21:59:18 UTC 2024
On Mon, Oct 28, 2024 at 01:33:09PM -0700, Umesh Nerlige Ramappa wrote:
> On Sat, Oct 26, 2024 at 12:08:47PM -0500, Lucas De Marchi wrote:
> > When the exec queue is destroyed, there's a race between a query to the
> > fdinfo and the exec queue value being updated: after the destroy ioctl,
> > if the fdinfo is queried before a call to guc_exec_queue_free_job(),
> > the wrong utilization is reported: it's not accumulated on the query
> > since the queue was removed from the array, and the value wasn't updated
> > yet by the free_job().
> >
> > Explicitly accumulate the engine utilization so the right value is
> > visible after the ioctl return.
> >
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667
> > Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index d098d2dd1b2d..b15ca84b2422 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -829,6 +829,14 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
> >
> > xe_exec_queue_kill(q);
> >
> > + /*
> > + * After killing and destroying the exec queue, make sure userspace has
> > + * an updated view of the run ticks, regardless if this was the last
> > + * ref: since the exec queue is removed from xef->exec_queue.xa, a
> > + * query to fdinfo after this returns could not account for this load.
> > + */
> > + xe_exec_queue_update_run_ticks(q);
> > +
>
> At this point we may/may-not have the updated LRC timestamp.
>
> fwiu, xe_exec_queue_kill() is an async call. It will queue a work that will
> disable guc scheduling on the context. Once guc notifies KMD that scheduling
> is disabled on this context, KMD knows for sure that the context has
> switched out and the lrc timestamp is updated for this context. It may work
> well for contexts that switch frequently and may not work for contexts that
> seldom switch or never destroy their exec queue.
>
> I still believe calling it from job free is the right thing to do. As for
> the ~120 Hz updates, these are just memory updates, so not sure if it's a
> huge performance impact.
>
I agree with Umesh here - unsure why it is a big deal to update the
busyness in free_job.
Also the timestamp counters can wrap and if they are sampled frequently
enough a wrap will be missed.
Matt
> If the ftrace is getting filed up, we could throttle that.
>
> Thanks,
> Umesh
>
> > trace_xe_exec_queue_close(q);
> > xe_exec_queue_put(q);
> >
> > --
> > 2.47.0
> >
More information about the Intel-xe
mailing list