[PATCH] drm/xe/client: Better correlate exec_queue and GT timestamps
Lucas De Marchi
lucas.demarchi at intel.com
Wed Jan 8 22:57:40 UTC 2025
On Mon, Jan 06, 2025 at 02:44:27PM -0800, Umesh Nerlige Ramappa wrote:
>On Thu, Dec 12, 2024 at 09:34:32AM -0800, Lucas De Marchi wrote:
>>This partially reverts commit fe4f5d4b6616 ("drm/xe: Clean up VM / exec
>>queue file lock usage."). While it's desired to have the mutex to
>>protect only the reference to the exec queue, getting and dropping each
>>mutex and then later getting the GPU timestamp, doesn't produce a
>>correct result: it introduces multiple opportunities for the task to be
>>scheduled out and thus wrecking havoc the deltas reported to userspace.
>>
>>Also, to better correlate the timestamp from the exec queues with the
>>GPU, disable preemption so they can be updated without allowing the task
>>to be scheduled out. We leave interrupts enabled as that shouldn't be
>>enough disturbance for the deltas to matter to userspace.
>
>Assuming
>
>- timestamp from exec queues = xe_exec_queue_update_run_ticks()
>- GPU timestamp = xe_hw_engine_read_timestamp()
>
>shouldn't you also move the xe_hw_engine_read_timestamp() within the
>preempt_disable/preempt_enable section?
this is what I thought I had done, but it seems I messed up.
>
>Something like this ..
>
> preempt_disable();
>
> xa_for_each(&xef->exec_queue.xa, i, q)
> xe_exec_queue_update_run_ticks(q);
>
> gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
I'd move the gpu_timestamp to be done before the exec queue, so we don't
call it with the exec queue mutex still taken. AFAIR this is what I was
doing in the test machine, but sent the wrong version of the patch. Let
me double check locally here and resend.
thanks
Lucas De Marchi
>
> preempt_enable();
>
>Thanks,
>Umesh
>
>>
>>Test scenario:
>>
>> * IGT'S `xe_drm_fdinfo --r --r utilization-single-full-load`
>> * Platform: LNL, where CI occasionally reports failures
>> * `stress -c $(nproc)` running in parallel to disturb the
>> system
>>
>>This brings a first failure from "after ~150 executions" to "never
>>occurs after 1000 attempts".
>>
>>Cc: stable at vger.kernel.org # v6.11+
>>Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3512
>>Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>---
>>drivers/gpu/drm/xe/xe_drm_client.c | 9 +++------
>>1 file changed, 3 insertions(+), 6 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
>>index 298a587da7f17..e307b4d6bab5a 100644
>>--- a/drivers/gpu/drm/xe/xe_drm_client.c
>>+++ b/drivers/gpu/drm/xe/xe_drm_client.c
>>@@ -338,15 +338,12 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file)
>>
>> /* Accumulate all the exec queues from this client */
>> mutex_lock(&xef->exec_queue.lock);
>>- xa_for_each(&xef->exec_queue.xa, i, q) {
>>- xe_exec_queue_get(q);
>>- mutex_unlock(&xef->exec_queue.lock);
>>+ preempt_disable();
>>
>>+ xa_for_each(&xef->exec_queue.xa, i, q)
>> xe_exec_queue_update_run_ticks(q);
>>
>>- mutex_lock(&xef->exec_queue.lock);
>>- xe_exec_queue_put(q);
>>- }
>>+ preempt_enable();
>> mutex_unlock(&xef->exec_queue.lock);
>>
>> gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>>--
>>2.47.0
>>
More information about the Intel-xe
mailing list