[PATCH i-g-t 1/2] Revert "tests/intel/xe_pm_residency: Add an assertion on MI_STORE execution time"

Poosa, Karthik karthik.poosa at intel.com
Tue Oct 8 05:00:26 UTC 2024


On 08-10-2024 10:17, Poosa, Karthik wrote:
>
> On 08-10-2024 09:17, Ghimiray, Himal Prasad wrote:
>>
>>
>> On 08-10-2024 08:52, Gupta, Anshuman wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ghimiray, Himal Prasad <himal.prasad.ghimiray at intel.com>
>>>> Sent: Tuesday, October 8, 2024 8:44 AM
>>>> To: Gupta, Anshuman <anshuman.gupta at intel.com>; igt-
>>>> dev at lists.freedesktop.org
>>>> Cc: Brost, Matthew <matthew.brost at intel.com>; Nilawar, Badal
>>>> <badal.nilawar at intel.com>; Tauro, Riana <riana.tauro at intel.com>; 
>>>> Poosa,
>>>> Karthik <karthik.poosa at intel.com>
>>>> Subject: Re: [PATCH i-g-t 1/2] Revert "tests/intel/xe_pm_residency: 
>>>> Add an
>>>> assertion on MI_STORE execution time"
>>>>
>>>>
>>>>
>>>> On 08-10-2024 07:31, Gupta, Anshuman wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ghimiray, Himal Prasad <himal.prasad.ghimiray at intel.com>
>>>>>> Sent: Monday, October 7, 2024 10:54 PM
>>>>>> To: igt-dev at lists.freedesktop.org
>>>>>> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray at intel.com>; Brost,
>>>>>> Matthew <matthew.brost at intel.com>; Nilawar, Badal
>>>>>> <badal.nilawar at intel.com>; Tauro, Riana <riana.tauro at intel.com>;
>>>>>> Gupta, Anshuman <anshuman.gupta at intel.com>; Poosa, Karthik
>>>>>> <karthik.poosa at intel.com>
>>>>>> Subject: [PATCH i-g-t 1/2] Revert "tests/intel/xe_pm_residency: Add
>>>>>> an assertion on MI_STORE execution time"
>>>>>>
>>>>>> The reported time does not reflect the completion time of
>>>>>> MI_STORE_DWORD; instead, it accounts for the delay in the scheduler.
>>>>>> Therefore, it represents the time taken between xe_exec and 
>>>>>> syncobj_wait.
>>>>>    igt_assert(syncobj_wait(fd, &syncobj, 1, INT64_MAX, 0, NULL));
>>>>> elapsed = igt_nsec_elapsed(&tv); elapsed is taken right after the
>>>>> syncobj_wait() therefore it represent the time taken by xe_exec +
>>>> syncobj_wait, total time taken for completion of job.
>>>>> Thanks,
>>>>> Anshuman.
>>>>
>>>>
>>>> That's true, while writing "time taken between xe_exec and 
>>>> syncobj_wait"
>>>> , I meant to convey in between start of xe_exec and syncobj_wait 
>>>> completion.
>>>> Will rephrase commit message before pushing.
>>> Why do we want to remove assertion ? We don't want to write IGT to 
>>> make CI happy it is to catch the bugs in KMD. Even in this case as 
>>> well this is a bug from Linux Kernel.
>>> I don't agree with removal of assertion.
>>
>> Few issues with this assertion.
>>
>> 1) IGT has --inactivity-timeout of 90 sec, which means you will not 
>> hit this assertion ever and SIGQUIT will be called if time between 
>> start of xe_exec and syncobj completion is ~0.9 sec.
>>
>> 2) Even 0.9 sec of delay is something huge for kernel. So why IGT 
>> assumes anything less than 1.2 sec is safe. Isn't it just to make 99% 
>> of idle time safe.
>>
>> This assertion solves no purpose, if IGT silently passes for anything 
>> less than 1.2 sec, assume 1.1sec (Isn't it huge delay for wq 
>> submission).
>
> This was added to assert on per-test timeout which can occur even 
> without inactivity timeout.
>
> See: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2484
>
> You can reduce it to assert on inactivity also.

2. Also, with this assert, we get below print in log to know submission 
and execution times taken, which would help in debug.

                 igt_debug("Execution took %.3fms (submit %.1fus, wait 
%.1fus)\n",
                           1e-6 * elapsed,
                           1e-3 * submit,
                           1e-3 * (elapsed - submit));

As this issue was sporadic and seen only on upstream CI, having this 
assert helps to know why timeout occurred.

So I'd suggest to keep the assert.


>>
>>
>> BR
>> Himal
>>
>>
>>> Thanks,
>>> Anshuman.
>>>>
>>>> Thanks for pointing this.
>>>>
>>>> Himal
>>>>
>>>>
>>>>>>
>>>>>> This reverts commit 92825ed72be61c5419d95db944fef1c9dda2215a.
>>>>>>
>>>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>>>> Cc: Badal Nilawar <badal.nilawar at intel.com>
>>>>>> Cc: Riana Tauro <riana.tauro at intel.com>
>>>>>> Cc: Anshuman Gupta <anshuman.gupta at intel.com>
>>>>>> Cc: Karthik Poosa <karthik.poosa at intel.com>
>>>>>> Signed-off-by: Himal Prasad Ghimiray
>>>>>> <himal.prasad.ghimiray at intel.com>
>>>>>> ---
>>>>>>    tests/intel/xe_pm_residency.c | 9 ---------
>>>>>>    1 file changed, 9 deletions(-)
>>>>>>
>>>>>> diff --git a/tests/intel/xe_pm_residency.c
>>>>>> b/tests/intel/xe_pm_residency.c index 772fe9b57..f4d05889c 100644
>>>>>> --- a/tests/intel/xe_pm_residency.c
>>>>>> +++ b/tests/intel/xe_pm_residency.c
>>>>>> @@ -144,15 +144,6 @@ static void exec_load(int fd, struct
>>>>>> drm_xe_engine_class_instance *hwe, unsigned
>>>>>>                  1e-3 * submit,
>>>>>>                  1e-3 * (elapsed - submit));
>>>>>>
>>>>>> -        /*
>>>>>> -         * MI_STORE_DWORD generally completes within couple of
>>>>>> ms.
>>>>>> -         * Assert if it takes more than 1.2 seconds, as it will 
>>>>>> cause
>>>>>> -         * IGT test to timeout due to sleep of 120 seconds which is
>>>>>> -         * the current per test timeout. Currently there is no 
>>>>>> way to
>>>>>> -         * read this timeout from IGT test.
>>>>>> -         */
>>>>>> -        igt_assert((uint64_t)elapsed < (uint64_t)(1.2 *
>>>>>> NSEC_PER_SEC));
>>>>>> -
>>>>>>            syncobj_reset(fd, &syncobj, 1);
>>>>>>
>>>>>>            /*
>>>>>> -- 
>>>>>> 2.34.1
>>>>>
>>>
>>


More information about the igt-dev mailing list