[PATCH 2/3] drm/xe: Accumulate exec queue timestamp on destroy

Mon Oct 28 22:32:03 UTC 2024

On Mon, Oct 28, 2024 at 01:33:09PM -0700, Umesh Nerlige Ramappa wrote:
>On Sat, Oct 26, 2024 at 12:08:47PM -0500, Lucas De Marchi wrote:
>>When the exec queue is destroyed, there's a race between a query to the
>>fdinfo and the exec queue value being updated: after the destroy ioctl,
>>if the fdinfo is queried before a call to guc_exec_queue_free_job(),
>>the wrong utilization is reported: it's not accumulated on the query
>>since the queue was removed from the array, and the value wasn't updated
>>yet by the free_job().
>>
>>Explicitly accumulate the engine utilization so the right value is
>>visible after the ioctl return.
>>
>>Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667
>>Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
>>Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>---
>>drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++++++++
>>1 file changed, 8 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>index d098d2dd1b2d..b15ca84b2422 100644
>>--- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>@@ -829,6 +829,14 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>>
>>	xe_exec_queue_kill(q);
>>
>>+	/*
>>+	 * After killing and destroying the exec queue, make sure userspace has
>>+	 * an updated view of the run ticks, regardless if this was the last
>>+	 * ref: since the exec queue is removed from xef->exec_queue.xa, a
>>+	 * query to fdinfo after this returns could not account for this load.
>>+	 */
>>+	xe_exec_queue_update_run_ticks(q);
>>+
>
>At this point we may/may-not have the updated LRC timestamp.
>
>fwiu, xe_exec_queue_kill() is an async call. It will queue a work that 
>will disable guc scheduling on the context. Once guc notifies KMD that 
>scheduling is disabled on this context, KMD knows for sure that the 
>context has switched out and the lrc timestamp is updated for this 

ok. do you know what's that notification? Is it the cleanup that we
process at __guc_exec_queue_process_msg_cleanup() ?  (which is where I'm
moving the io read to in the other patch)

>context. It may work well for contexts that switch frequently and may 
>not work for contexts that seldom switch or never destroy their exec 
>queue.

why does it matter? it's only for contexts going away - for context that
are scheduled out the sampling on fdinfo read covers it. Note that this
call is not replacing other calls. If the exec queue is not destroyed,
the other 2 places should still cover it.

The problem with the case when the destroy ioctl is called is that q is
removed from xef->exec_queue.xa, so the additional call in fdinfo-read stops
working. It  then relies on the last job completing and being released before
a read on the fdinfo. As CI shows, this is racy and fails in ~10% of
the cases.

On a remote system I got it was failing ~20% of the time

>
>I still believe calling it from job free is the right thing to do. As 
>for the ~120 Hz updates, these are just memory updates, so not sure if 
>it's a huge performance impact.

it seems now you are commenting on the change in the other patch?

I think it's pretty pointless to do the io access when nobody cares
about that number. The user needs to be reading the fdinfo, probably
through gputop or the like for the number to be meaningful. So why do we
care about sampling it at a high frequency when nobody is looking? For
real, non-CI use-cases it should be as good as before.  The difference
is that instead of getting 120 updates of few cycles, we are getting
just 1 with the amount of cycles that got used in the last sampling
period.

Looking at the traces we do get a few low-msec execution time for
xe_lrc_update_timestamp(). Mind you, there's another race about updating
u32 in xef in a non-atomic way and doing it that frequently just makes
it more likely to happen. That still needs a separate fix.

Goal of this series is to make this work:

	$ for ((i=0; i < 100; i++)); do echo $i/100 >&2; if ! ./build/tests/xe_drm_fdinfo; then break; fi; done

I was happy when sending this that it passed. But just double checking
on another host, the issue is still there and I get this after 16
iterations:

Starting subtest: utilization-single-full-load-destroy-queue
....
(xe_drm_fdinfo:5997) DEBUG: vcs: spinner ended (timestamp=4818209)
(xe_drm_fdinfo:5997) DEBUG: vcs: sample 1: cycles 9637999, total_cycles 19272820538
(xe_drm_fdinfo:5997) DEBUG: vcs: sample 2: cycles 9637999, total_cycles 19277703269
(xe_drm_fdinfo:5997) DEBUG: vcs: percent: 0.000000
(xe_drm_fdinfo:5997) CRITICAL: Test assertion failure function check_results, file ../tests/intel/xe_drm_fdinfo.c:527:
(xe_drm_fdinfo:5997) CRITICAL: Failed assertion: 95.0 < percent
(xe_drm_fdinfo:5997) CRITICAL: error: 95.000000 >= 0.000000
(xe_drm_fdinfo:5997) igt_core-INFO: Stack trace:
(xe_drm_fdinfo:5997) igt_core-INFO:   #0 ../lib/igt_core.c:2051 __igt_fail_assert()
(xe_drm_fdinfo:5997) igt_core-INFO:   #1 [check_results+0x204]
(xe_drm_fdinfo:5997) igt_core-INFO:   #2 ../tests/intel/xe_drm_fdinfo.c:860 __igt_unique____real_main806()
(xe_drm_fdinfo:5997) igt_core-INFO:   #3 ../tests/intel/xe_drm_fdinfo.c:806 main()
(xe_drm_fdinfo:5997) igt_core-INFO:   #4 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
(xe_drm_fdinfo:5997) igt_core-INFO:   #5 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
(xe_drm_fdinfo:5997) igt_core-INFO:   #6 [_start+0x25]
****  END  ****

which makes me think it's probably related to the kill being async as
you mentioned.

I wonder if we should synchronize the call in the fdinfo read with the
queues that are going away.

Another thought I had was to use the wabb, but afaics we can only
execute something on context restore, not on context save.

>
>If the ftrace is getting filed up, we could throttle that.

oh no, that is definitely not what I want. If we enable the tracepoint, we
want to see it, not artifically drop the events.

Initially (and to get a good measure of function runtime), I was
actually using retsnoop rather than using the previously non-existent
tracepoint:

	retsnoop -e xe_lrc_update_timestamp -e xe_lrc_create -e xe_lrc_destroy  -S -A -C args.fmt-max-arg-width=0

Lucas De Marchi

>
>Thanks,
>Umesh
>
>>	trace_xe_exec_queue_close(q);
>>	xe_exec_queue_put(q);
>>
>>-- 
>>2.47.0
>>