[PATCH] drm/xe/client: Better correlate exec_queue and GT timestamps

Sat Dec 21 00:42:09 UTC 2024

On Fri, Dec 20, 2024 at 04:32:09PM -0800, Umesh Nerlige Ramappa wrote:
>On Fri, Dec 20, 2024 at 12:32:16PM -0600, Lucas De Marchi wrote:
>>On Thu, Dec 19, 2024 at 12:49:16PM -0800, Umesh Nerlige Ramappa wrote:
>>>On Thu, Dec 12, 2024 at 09:34:32AM -0800, Lucas De Marchi wrote:
>>>>This partially reverts commit fe4f5d4b6616 ("drm/xe: Clean up VM / exec
>>>>queue file lock usage."). While it's desired to have the mutex to
>>>>protect only the reference to the exec queue, getting and dropping each
>>>>mutex and then later getting the GPU timestamp, doesn't produce a
>>>>correct result: it introduces multiple opportunities for the task to be
>>>>scheduled out and thus wrecking havoc the deltas reported to userspace.
>>>>
>>>>Also, to better correlate the timestamp from the exec queues with the
>>>>GPU, disable preemption so they can be updated without allowing the task
>>>>to be scheduled out. We leave interrupts enabled as that shouldn't be
>>>>enough disturbance for the deltas to matter to userspace.
>>>
>>>Like I said in the past, this is not trivial to solve and I would 
>>>hate to add anything in the KMD to do so.
>>
>>I think the best we can do in the kernel side is to try to guarantee the
>>correlated counters are sampled together... And that is already very
>>good per my tests. Also, it'd not only be good from a testing
>>perspective, but for any userspace trying to make sense of the 2
>>counters.
>>
>>Note that this is not much different from how e.g. perf samples group
>>events:
>>
>>	The unit of scheduling in perf is not an individual event, but rather an
>>	event group, which may contain one or more events (potentially on
>>	different PMUs). The notion of an event group is useful for ensuring
>>	that a set of mathematically related events are all simultaneously
>>	measured for the same period of time. For example, the number of L1
>>	cache misses should not be larger than the number of L2 cache accesses.
>>	Otherwise, it may happen that the events get multiplexed and their
>>	measurements would no longer be comparable, making the analysis more
>>	difficult.
>>
>>See __perf_event_read() that will call pmu->read() on all sibling events
>>while disabling preemption:
>>
>>	perf_event_read()
>>	{
>>		...
>>		preempt_disable();
>>		event_cpu = __perf_event_read_cpu(event, event_cpu);
>>		...
>>		(void)smp_call_function_single(event_cpu, __perf_event_read, &data, 1);
>>		preempt_enable();
>>		...
>>	}
>>
>>so... at least there's prior art for that... for the same reason that
>>userspace should see the values sampled together.
>
>Well, I have used the preempt_disable/enable when fixing some selftest 
>(i915), but was not happy that there were still some rare failures. If 
>reducing error rates is the intention, then it's fine. In my mind, the 
>issue still exists and once in a while we would end up assessing such 
>a failure. Maybe, in addition, fixing up the IGTs like you suggest 
>below is a worthwhile option.

for me this fix is not targeted at tests, even if it improves them a
lot. It's more for consistent userspace behavior.

>
>>
>>>
>>>For IGT, why not just take 4 samples for the measurement (separate 
>>>out the 2 counters)
>>>
>>>1. get gt timestamp in the first sample
>>>2. get run ticks in the second sample
>>>3. get run ticks in the third sample
>>>4. get gt timestamp in the fourth sample
>>>
>>>Rely on 1 and 4 for gt timestamp delta and on 2 and 3 for run 
>>>ticks delta.
>>
>>this won't fix it for the general case: you get rid of the > 100% case,
>>you make the < 100% much worse.
>
>yeah, that's quite possible.
>
>>
>>For a testing perspective I think the non-flaky solution is to stop
>>calculating percentages and rather check that the execution timestamp
>>recorded by the GPU very closely matches (minus gpu scheduling delays)
>>the one we got via fdinfo once the fence signals and we wait for the job
>>completion.
>
>Agree, we should change how we validate the counters in IGT.

I have a wip patch to cleanup and submit to igt. I will submit it soon.

Lucas De Marchi