[PATCH v2] drm/xe/client: Better correlate exec_queue and GT timestamps

Lucas De Marchi lucas.demarchi at intel.com
Tue Jan 14 00:38:11 UTC 2025


On Mon, Jan 13, 2025 at 03:00:55PM -0800, Umesh Nerlige Ramappa wrote:
>On Thu, Jan 09, 2025 at 12:03:40PM -0800, Lucas De Marchi wrote:
>>This partially reverts commit fe4f5d4b6616 ("drm/xe: Clean up VM / exec
>>queue file lock usage."). While it's desired to have the mutex to
>>protect only the reference to the exec queue, getting and dropping each
>>mutex and then later getting the GPU timestamp, doesn't produce a
>>correct result: it introduces multiple opportunities for the task to be
>>scheduled out and thus wrecking havoc the deltas reported to userspace.
>>
>>Also, to better correlate the timestamp from the exec queues with the
>>GPU, disable preemption so they can be updated without allowing the task
>>to be scheduled out. We leave interrupts enabled as that shouldn't be
>>enough disturbance for the deltas to matter to userspace.
>>
>>Test scenario:
>>
>>	* IGT'S `xe_drm_fdinfo --r utilization-single-full-load`
>>	* Platform: LNL, where CI occasionally reports failures
>>	* `stress -c $(nproc)` running in parallel to disturb the
>>	  system
>>
>>This brings a first failure from "after ~150 executions" to "never
>>occurs after 1000 attempts".
>>
>>v2: Also keep xe_hw_engine_read_timestamp() call inside the
>>   preemption-disabled section (Umesh)
>>
>>Cc: stable at vger.kernel.org # v6.11+
>>Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>>Cc: Matthew Brost <matthew.brost at intel.com>
>>Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3512
>>Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>---
>>drivers/gpu/drm/xe/xe_drm_client.c | 14 ++++++--------
>>1 file changed, 6 insertions(+), 8 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
>>index 7d55ad846bac5..2220a09bf9751 100644
>>--- a/drivers/gpu/drm/xe/xe_drm_client.c
>>+++ b/drivers/gpu/drm/xe/xe_drm_client.c
>>@@ -337,20 +337,18 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file)
>>		return;
>>	}
>>
>>+	/* Let both the GPU timestamp and exec queue be updated together */
>>+	preempt_disable();
>>+	gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>>+
>>	/* Accumulate all the exec queues from this client */
>>	mutex_lock(&xef->exec_queue.lock);
>
>mutex_lock could sleep and you have disabled preemption above, so not 
>a good idea. I think it will bug check if the lock is contended.
>
>Earlier you had mutex_lock on the outside, so that was fine.

yeah... saw that in the CI test results.

So far with the igt patches it seems we eliminated all the issues. I
will come back to this eventually, but priority is now pretty low.

thanks
Lucas De Marchi

>
>Thanks,
>Umesh
>
>>-	xa_for_each(&xef->exec_queue.xa, i, q) {
>>-		xe_exec_queue_get(q);
>>-		mutex_unlock(&xef->exec_queue.lock);
>>
>>+	xa_for_each(&xef->exec_queue.xa, i, q)
>>		xe_exec_queue_update_run_ticks(q);
>>
>>-		mutex_lock(&xef->exec_queue.lock);
>>-		xe_exec_queue_put(q);
>>-	}
>>	mutex_unlock(&xef->exec_queue.lock);
>>-
>>-	gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>>+	preempt_enable();
>>
>>	xe_force_wake_put(gt_to_fw(hwe->gt), fw_ref);
>>	xe_pm_runtime_put(xe);
>>-- 
>>2.47.0
>>


More information about the Intel-xe mailing list