[PATCH i-g-t v3 01/10] tests/intel/xe_drm_fdinfo: Extend mercy to the upper end

Wed Aug 28 14:44:07 UTC 2024

On 8/27/2024 6:54 PM, Lucas De Marchi wrote:
> When we are processing the fdinfo of each client, the gpu time is read
> first, and then later all the exec queues are accumulated. It's thus
> possible that the total gpu time is smaller than the time reported in
> the exec queues. A preemption in the middle of second sample would
> exaggerate the problem:
> 				  total_cycles	      cycles
> 	s1: read exec queues times			*
> 	s1: read gpu time		|		*
> 	.				|		*
> 	.				|		*
> 	.				|		*
> 	-> xe_spin_end()		|		*
> 	s2: read exec queues times	|
> 	s2: read gpu time		|
>
> There's nothing guaranteeing and atomic read between the gpu time and
> exec_queue time in either s1 or s2. Due to the call to xe_spin_end(),
> in which exec_queue tick stops and gpu tick continues, it's much more
> likely delta_total_cycles > cycles. However, if there was any additional
> delay between the readout in s1, it could also go the other way.
>
> In a more realistic situation, as reported in CI:
>
> 	(xe_drm_fdinfo:1072) DEBUG: rcs: sample 1: cycles 29223333, total_cycles 5801623069
> 	(xe_drm_fdinfo:1072) DEBUG: rcs: sample 2: cycles 38974256, total_cycles 5811276365
> 	(xe_drm_fdinfo:1072) DEBUG: rcs: percent: 101.000000
>
> Extend the same mercy to the upper end as we did to the lower end.
> This also matches the tolerance applied on the i915 side in
> tests/intel/drm_fdinfo.c:__assert_within_epsilon().
>
> v2: Fix the commit message since the problem is actually on sample1, not
>      sample2
>
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>

LGTM thanks for the detailed description.

Reviewed-by: Nirmoy.das at intel.com

> ---
>   tests/intel/xe_drm_fdinfo.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/intel/xe_drm_fdinfo.c b/tests/intel/xe_drm_fdinfo.c
> index 4696c6495..e3a99a2dc 100644
> --- a/tests/intel/xe_drm_fdinfo.c
> +++ b/tests/intel/xe_drm_fdinfo.c
> @@ -484,7 +484,7 @@ check_results(struct pceu_cycles *s1, struct pceu_cycles *s2,
>   	igt_debug("%s: percent: %f\n", engine_map[class], percent);
>   
>   	if (flags & TEST_BUSY)
> -		igt_assert(percent >= 95 && percent <= 100);
> +		igt_assert(percent >= 95 && percent <= 105);
>   	else
>   		igt_assert(!percent);
>   }