[PATCH 2/3] drm/xe: Accumulate exec queue timestamp on destroy
Cavitt, Jonathan
jonathan.cavitt at intel.com
Mon Oct 28 22:17:17 UTC 2024
-----Original Message-----
From: Brost, Matthew <matthew.brost at intel.com>
Sent: Monday, October 28, 2024 2:59 PM
To: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>
Cc: De Marchi, Lucas <lucas.demarchi at intel.com>; intel-xe at lists.freedesktop.org; Cavitt, Jonathan <jonathan.cavitt at intel.com>
Subject: Re: [PATCH 2/3] drm/xe: Accumulate exec queue timestamp on destroy
>
> On Mon, Oct 28, 2024 at 01:33:09PM -0700, Umesh Nerlige Ramappa wrote:
> > On Sat, Oct 26, 2024 at 12:08:47PM -0500, Lucas De Marchi wrote:
> > > When the exec queue is destroyed, there's a race between a query to the
> > > fdinfo and the exec queue value being updated: after the destroy ioctl,
> > > if the fdinfo is queried before a call to guc_exec_queue_free_job(),
> > > the wrong utilization is reported: it's not accumulated on the query
> > > since the queue was removed from the array, and the value wasn't updated
> > > yet by the free_job().
> > >
> > > Explicitly accumulate the engine utilization so the right value is
> > > visible after the ioctl return.
> > >
> > > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667
> > > Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > index d098d2dd1b2d..b15ca84b2422 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > @@ -829,6 +829,14 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
> > >
> > > xe_exec_queue_kill(q);
> > >
> > > + /*
> > > + * After killing and destroying the exec queue, make sure userspace has
> > > + * an updated view of the run ticks, regardless if this was the last
> > > + * ref: since the exec queue is removed from xef->exec_queue.xa, a
> > > + * query to fdinfo after this returns could not account for this load.
> > > + */
> > > + xe_exec_queue_update_run_ticks(q);
> > > +
> >
> > At this point we may/may-not have the updated LRC timestamp.
> >
> > fwiu, xe_exec_queue_kill() is an async call. It will queue a work that will
> > disable guc scheduling on the context. Once guc notifies KMD that scheduling
> > is disabled on this context, KMD knows for sure that the context has
> > switched out and the lrc timestamp is updated for this context. It may work
> > well for contexts that switch frequently and may not work for contexts that
> > seldom switch or never destroy their exec queue.
> >
> > I still believe calling it from job free is the right thing to do. As for
> > the ~120 Hz updates, these are just memory updates, so not sure if it's a
> > huge performance impact.
> >
>
> I agree with Umesh here - unsure why it is a big deal to update the
> busyness in free_job.
I might be misreading this, but I'm fairly certain Umesh is saying that we
*should* be updating the busyness in free_job, not that whether we do
so or not is of no consequence.
Not that I particularly agree with that sentiment, mind. Actually, I think
you and I agree that we should only be updating the busyness here and
when querying the busyness as a part of the fdinfo query.
Please correct me if I'm misunderstanding.
-Jonathan Cavitt
>
> Also the timestamp counters can wrap and if they are sampled frequently
> enough a wrap will be missed.
>
> Matt
>
> > If the ftrace is getting filed up, we could throttle that.
> >
> > Thanks,
> > Umesh
> >
> > > trace_xe_exec_queue_close(q);
> > > xe_exec_queue_put(q);
> > >
> > > --
> > > 2.47.0
> > >
>
More information about the Intel-xe
mailing list