[PATCH 2/3] drm/xe: Accumulate exec queue timestamp on destroy

Tue Oct 29 17:27:28 UTC 2024

On Mon, Oct 28, 2024 at 05:32:03PM -0500, Lucas De Marchi wrote:
>On Mon, Oct 28, 2024 at 01:33:09PM -0700, Umesh Nerlige Ramappa wrote:
>>On Sat, Oct 26, 2024 at 12:08:47PM -0500, Lucas De Marchi wrote:
>>>When the exec queue is destroyed, there's a race between a query to the
>>>fdinfo and the exec queue value being updated: after the destroy ioctl,
>>>if the fdinfo is queried before a call to guc_exec_queue_free_job(),
>>>the wrong utilization is reported: it's not accumulated on the query
>>>since the queue was removed from the array, and the value wasn't updated
>>>yet by the free_job().
>>>
>>>Explicitly accumulate the engine utilization so the right value is
>>>visible after the ioctl return.
>>>
>>>Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667
>>>Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
>>>Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>>---
>>>drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++++++++
>>>1 file changed, 8 insertions(+)
>>>
>>>diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>index d098d2dd1b2d..b15ca84b2422 100644
>>>--- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>>++ b/drivers/gpu/drm/xe/xe_exec_quee.c
>>>@@ -829,6 +829,14 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>>>
>>>	xe_exec_queue_kill(q);
>>>
>>>+	/*
>>>+	 * After killing and destroying the exec queue, make sure userspace has
>>>+	 * an updated view of the run ticks, regardless if this was the last
>>>+	 * ref: since the exec queue is removed from xef->exec_queue.xa, a
>>>+	 * query to fdinfo after this returns could not account for this load.
>>>+	 */
>>>+	xe_exec_queue_update_run_ticks(q);
>>>+
>>
>>At this point we may/may-not have the updated LRC timestamp.
>>
>>fwiu, xe_exec_queue_kill() is an async call. It will queue a work 
>>that will disable guc scheduling on the context. Once guc notifies 
>>KMD that scheduling is disabled on this context, KMD knows for sure 
>>that the context has switched out and the lrc timestamp is updated 
>>for this
>
>ok. do you know what's that notification? Is it the cleanup that we
>process at __guc_exec_queue_process_msg_cleanup() ?  (which is where I'm
>moving the io read to in the other patch)
>
>>context. It may work well for contexts that switch frequently and 
>>may not work for contexts that seldom switch or never destroy their 
>>exec queue.
>
>why does it matter? it's only for contexts going away - for context that
>are scheduled out the sampling on fdinfo read covers it. Note that this
>call is not replacing other calls. If the exec queue is not destroyed,
>the other 2 places should still cover it.
>
>The problem with the case when the destroy ioctl is called is that q is
>removed from xef->exec_queue.xa, so the additional call in fdinfo-read stops
>working. It  then relies on the last job completing and being released before
>a read on the fdinfo. As CI shows, this is racy and fails in ~10% of
>the cases.
>
>On a remote system I got it was failing ~20% of the time
>
>>
>>I still believe calling it from job free is the right thing to do. 
>>As for the ~120 Hz updates, these are just memory updates, so not 
>>sure if it's a huge performance impact.
>
>it seems now you are commenting on the change in the other patch?

sorry, I will try to be more clear here.

patch 3/3:

I missed the fact that the updates could be in VRAM, so moving the 
update out of job_free makes sense. 

Also since fdinfo query is expected to be frequent enough, adding the update 
to exec_queue_fini makes sense too.

patch 2/3 (everything below)

it's just the update in destroy_ioctl that I am not convinced about since 
the kill is async. Also you see another instance of failure already below.

>
>I think it's pretty pointless to do the io access when nobody cares
>about that number. The user needs to be reading the fdinfo, probably
>through gputop or the like for the number to be meaningful. So why do we
>care about sampling it at a high frequency when nobody is looking? For
>real, non-CI use-cases it should be as good as before.  The difference
>is that instead of getting 120 updates of few cycles, we are getting
>just 1 with the amount of cycles that got used in the last sampling
>period.
>
>Looking at the traces we do get a few low-msec execution time for
>xe_lrc_update_timestamp(). Mind you, there's another race about updating
>u32 in xef in a non-atomic way and doing it that frequently just makes
>it more likely to happen. That still needs a separate fix.
>
>Goal of this series is to make this work:
>
>	$ for ((i=0; i < 100; i++)); do echo $i/100 >&2; if ! ./build/tests/xe_drm_fdinfo; then break; fi; done
>
>I was happy when sending this that it passed. But just double checking
>on another host, the issue is still there and I get this after 16
>iterations:
>
>Starting subtest: utilization-single-full-load-destroy-queue
>....
>(xe_drm_fdinfo:5997) DEBUG: vcs: spinner ended (timestamp=4818209)
>(xe_drm_fdinfo:5997) DEBUG: vcs: sample 1: cycles 9637999, total_cycles 19272820538
>(xe_drm_fdinfo:5997) DEBUG: vcs: sample 2: cycles 9637999, total_cycles 19277703269
>(xe_drm_fdinfo:5997) DEBUG: vcs: percent: 0.000000
>(xe_drm_fdinfo:5997) CRITICAL: Test assertion failure function check_results, file ../tests/intel/xe_drm_fdinfo.c:527:
>(xe_drm_fdinfo:5997) CRITICAL: Failed assertion: 95.0 < percent
>(xe_drm_fdinfo:5997) CRITICAL: error: 95.000000 >= 0.000000
>(xe_drm_fdinfo:5997) igt_core-INFO: Stack trace:
>(xe_drm_fdinfo:5997) igt_core-INFO:   #0 ../lib/igt_core.c:2051 __igt_fail_assert()
>(xe_drm_fdinfo:5997) igt_core-INFO:   #1 [check_results+0x204]
>(xe_drm_fdinfo:5997) igt_core-INFO:   #2 ../tests/intel/xe_drm_fdinfo.c:860 __igt_unique____real_main806()
>(xe_drm_fdinfo:5997) igt_core-INFO:   #3 ../tests/intel/xe_drm_fdinfo.c:806 main()
>(xe_drm_fdinfo:5997) igt_core-INFO:   #4 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
>(xe_drm_fdinfo:5997) igt_core-INFO:   #5 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
>(xe_drm_fdinfo:5997) igt_core-INFO:   #6 [_start+0x25]
>****  END  ****
>
>which makes me think it's probably related to the kill being async as
>you mentioned.
>
>I wonder if we should synchronize the call in the fdinfo read with the
>queues that are going away.

Hmm, maybe.

I was of the opinion that we should solve it in Xe by adding an update call 
in an additional place (like you are doing), but after digging into it a 
bit, I am not sure if we should resolve this specific issue. Instead, we 
should alter the test to not check for accuracy when the queue is destroyed 
before taking the second sample. We know that the ticks will get 
updated at some reasonable point in future and the user will see it in 
subsequent fdinfo queries anyways. If that "reasonable point in future" is 
unacceptably large, then I think the problem is outside the PCEU domain.

Note that the original reason we added the test was to catch the ref count 
issue with xef object (which is now fixed).

>
>Another thought I had was to use the wabb, but afaics we can only
>execute something on context restore, not on context save.

I am curious what you want to run in context save though and how it's any 
different from what's happening now - CTX_TIMESTAMP is being updated on 
save.

>
>>
>>If the ftrace is getting filed up, we could throttle that.
>
>oh no, that is definitely not what I want. If we enable the tracepoint, we
>want to see it, not artifically drop the events.
>
>Initially (and to get a good measure of function runtime), I was
>actually using retsnoop rather than using the previously non-existent
>tracepoint:
>
>	retsnoop -e xe_lrc_update_timestamp -e xe_lrc_create -e xe_lrc_destroy  -S -A -C args.fmt-max-arg-width=0

Didn't know that. That's ^ useful.

Thanks,
Umesh
>
>Lucas De Marchi
>
>>
>>Thanks,
>>Umesh
>>
>>>	trace_xe_exec_queue_close(q);
>>>	xe_exec_queue_put(q);
>>>
>>>-- 
>>>2.47.0
>>>