[PATCH 2/3] drm/xe: Accumulate exec queue timestamp on destroy
Lucas De Marchi
lucas.demarchi at intel.com
Tue Oct 29 19:53:14 UTC 2024
On Tue, Oct 29, 2024 at 12:31:54PM -0700, Umesh Nerlige Ramappa wrote:
>On Tue, Oct 29, 2024 at 02:12:22PM -0500, Lucas De Marchi wrote:
>>On Tue, Oct 29, 2024 at 12:58:35PM -0500, Lucas De Marchi wrote:
>>>>>Starting subtest: utilization-single-full-load-destroy-queue
>>>>>....
>>>>>(xe_drm_fdinfo:5997) DEBUG: vcs: spinner ended (timestamp=4818209)
>>>>>(xe_drm_fdinfo:5997) DEBUG: vcs: sample 1: cycles 9637999, total_cycles 19272820538
>>>>>(xe_drm_fdinfo:5997) DEBUG: vcs: sample 2: cycles 9637999, total_cycles 19277703269
>>>>>(xe_drm_fdinfo:5997) DEBUG: vcs: percent: 0.000000
>>>>>(xe_drm_fdinfo:5997) CRITICAL: Test assertion failure function check_results, file ../tests/intel/xe_drm_fdinfo.c:527:
>>>>>(xe_drm_fdinfo:5997) CRITICAL: Failed assertion: 95.0 < percent
>>>>>(xe_drm_fdinfo:5997) CRITICAL: error: 95.000000 >= 0.000000
>>>>>(xe_drm_fdinfo:5997) igt_core-INFO: Stack trace:
>>>>>(xe_drm_fdinfo:5997) igt_core-INFO: #0 ../lib/igt_core.c:2051 __igt_fail_assert()
>>>>>(xe_drm_fdinfo:5997) igt_core-INFO: #1 [check_results+0x204]
>>>>>(xe_drm_fdinfo:5997) igt_core-INFO: #2 ../tests/intel/xe_drm_fdinfo.c:860 __igt_unique____real_main806()
>>>>>(xe_drm_fdinfo:5997) igt_core-INFO: #3 ../tests/intel/xe_drm_fdinfo.c:806 main()
>>>>>(xe_drm_fdinfo:5997) igt_core-INFO: #4 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
>>>>>(xe_drm_fdinfo:5997) igt_core-INFO: #5 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
>>>>>(xe_drm_fdinfo:5997) igt_core-INFO: #6 [_start+0x25]
>>>>>**** END ****
>>>>>
>>>>>which makes me think it's probably related to the kill being async as
>>>>>you mentioned.
>>>>>
>>>>>I wonder if we should synchronize the call in the fdinfo read with the
>>>>>queues that are going away.
>>>>
>>>>Hmm, maybe.
>>>
>>>doing that it passes for me 62/100 running all
>>>xe_drm_fdinfo at utilization-* tests.
>>>
>>>The failure on run 63 is different and I think it's another bug or
>>
>>
>>so the other failure, that I forgot to paste:
>>
>> Starting subtest: utilization-all-full-load
>> (xe_drm_fdinfo:14864) CRITICAL: Test assertion failure function check_results, file ../tests/intel/xe_drm_fdinfo.c:528:
>> (xe_drm_fdinfo:14864) CRITICAL: Failed assertion: percent < 105.0
>> (xe_drm_fdinfo:14864) CRITICAL: error: 315.453826 >= 105.000000
>> Stack trace:
>> #0 ../lib/igt_core.c:2051 __igt_fail_assert()
>> #1 ../tests/intel/xe_drm_fdinfo.c:520 check_results()
>> #2 ../tests/intel/xe_drm_fdinfo.c:464 __igt_unique____real_main806()
>> #3 ../tests/intel/xe_drm_fdinfo.c:806 main()
>> #4 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
>> #5 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
>> #6 [_start+0x25]
>> Subtest utilization-all-full-load failed.
>> **** DEBUG ****
>> (xe_drm_fdinfo:14864) DEBUG: rcs: spinner started
>> (xe_drm_fdinfo:14864) DEBUG: bcs: spinner started
>> (xe_drm_fdinfo:14864) DEBUG: ccs: spinner started
>> (xe_drm_fdinfo:14864) DEBUG: vcs: spinner started
>> (xe_drm_fdinfo:14864) DEBUG: vecs: spinner started
>> (xe_drm_fdinfo:14864) DEBUG: rcs: spinner ended (timestamp=15218479)
>> (xe_drm_fdinfo:14864) DEBUG: bcs: spinner ended (timestamp=15194339)
>> (xe_drm_fdinfo:14864) DEBUG: vcs: spinner ended (timestamp=4837648)
>> (xe_drm_fdinfo:14864) DEBUG: vecs: spinner ended (timestamp=4816316)
>> (xe_drm_fdinfo:14864) DEBUG: ccs: spinner ended (timestamp=4859494)
>> (xe_drm_fdinfo:14864) DEBUG: rcs: sample 1: cycles 40481368, total_cycles 31104224861
>> (xe_drm_fdinfo:14864) DEBUG: rcs: sample 2: cycles 55700053, total_cycles 31109049238
>> (xe_drm_fdinfo:14864) DEBUG: rcs: percent: 315.453826
>> (xe_drm_fdinfo:14864) CRITICAL: Test assertion failure function check_results, file ../tests/intel/xe_drm_fdinfo.c:528:
>> (xe_drm_fdinfo:14864) CRITICAL: Failed assertion: percent < 105.0
>> (xe_drm_fdinfo:14864) CRITICAL: error: 315.453826 >= 105.000000
>> (xe_drm_fdinfo:14864) igt_core-INFO: Stack trace:
>> (xe_drm_fdinfo:14864) igt_core-INFO: #0 ../lib/igt_core.c:2051 __igt_fail_assert()
>>
>>
>>From the timestamp read by the GPU;
>>rcs timestamp=15218479 and bcs timestamp=15194339... which is indeed much
>>higher than the total_cycles available:
>>31109049238 - 31104224861 = 4824377, which is reasonably similar to the
>>timestamp for the other engines.
>>
>>my hypothesis is something like this:
>>
>>sample1:
>> accumulate_exec_queue (t = 0)
>> <<<<<<<< premption
>> read_total_gpu_cycles (t = 200)
>>
>>sample2:
>> accumulate_exec_queue (t = 300)
>> read_total_gpu_cycles (t = 300)
>>
>
>It could as well be the second sample, see my previous email on why
>run ticks can be larger than the gt timestamp delta.
>
>For the sake of narrowing it down, you could capture the value of
>CTX_TIMESTAMP mmio before killing the exec queue in destroy. It should
>be ticking since the context is active. Once the context stops, it
>would have an updated value. That way we know how long it took to
>stop.
in this test there is no kill involved.... it happens in
utilization-all-full-load
Lucas De Marchi
More information about the Intel-xe
mailing list