[PATCH i-g-t v3] tests/intel/xe_exec_fault_mode: Don't return early

Nirmoy Das nirmoy.das at linux.intel.com
Wed Aug 28 15:26:48 UTC 2024


On 8/28/2024 5:15 PM, Andrzej Hajda wrote:
>
>
> On 28.08.2024 11:55, Nirmoy Das wrote:
>> Tests that are causing pagefaults should wait for exec queue to be ban
>> otherwise pending engine resets because of on-going pagefaults would
>> cause failure in subsequent tests to fail.
>>
>> Set a larger 5 sec timeout if still tests fail, we can blame
>> driver in such case.
>
> I try to understand what causes such big delay, any ideas? Btw if the 
> driver is to blame, maybe it should be fixed instead of increasing 
> timeout in the test.


 From this IGT test prospective, this subtest causes a engine reset and 
exec ban so  which it should wait. Now if that behavior doesn't met then 
we need

fix the driver but I think that is different topic.

>
> In v2 there was one failure on PVC: 
> https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_11646/bat-pvc-2/igt@xe_exec_fault_mode@twice-invalid-userptr-fault.html
> This time it passed flawlessly (as well as in v1), but not due to 
> increased time limit (at least dmesg shows the test took much less 
> than 1second).


Yes I saw that, it just mean the ctx wasn't banned which is strange. 
There is not enough info to debug.

> Let's wait for xeFULL pass, maybe it will show some interesting results.


Regards,

Nirmoy

>
> Regards
> Andrzej
>>
>> v2: specify timeout reason and iterate over exec_queues(Andrzej)
>> v3: increase timeout
>>
>> Cc: Andrzej Hajda <andrzej.hajda at intel.com>
>> Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Tejas Upadhyay <tejas.upadhyay at intel.com>
>> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1630
>> Reviewed-by: Matthew Brost <matthew.brost at intel.com> #v1
>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>>   tests/intel/xe_exec_fault_mode.c | 25 +++++++++++++++++++++++++
>>   1 file changed, 25 insertions(+)
>>
>> diff --git a/tests/intel/xe_exec_fault_mode.c 
>> b/tests/intel/xe_exec_fault_mode.c
>> index 1f1f1e50b..e3e6047e7 100644
>> --- a/tests/intel/xe_exec_fault_mode.c
>> +++ b/tests/intel/xe_exec_fault_mode.c
>> @@ -36,6 +36,22 @@
>>   #define INVALID_VA    (0x1 << 8)
>>   #define ENABLE_SCRATCH  (0x1 << 9)
>>   +static int get_ban_property(int xe, struct 
>> drm_xe_engine_class_instance *eci,
>> +                uint32_t vm, uint32_t exec_queue)
>> +{
>> +    struct drm_xe_exec_queue_get_property args = {
>> +        .value = -1,
>> +        .reserved[0] = 0,
>> +        .property = DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN,
>> +    };
>> +
>> +    args.exec_queue_id = exec_queue;
>> +
>> +    do_ioctl(xe, DRM_IOCTL_XE_EXEC_QUEUE_GET_PROPERTY, &args);
>> +
>> +    return args.value;
>> +}
>> +
>>   /**
>>    * SUBTEST: invalid-va
>>    * Description: Access invalid va and check for EIO through user 
>> fence.
>> @@ -324,6 +340,15 @@ test_exec(int fd, struct 
>> drm_xe_engine_class_instance *eci,
>>       xe_wait_ufence(fd, &data[0].vm_sync, USER_FENCE_VALUE,
>>                  bind_exec_queues[0], NSEC_PER_SEC);
>>   +    if ((flags & INVALID_FAULT)) {
>> +        igt_set_timeout(5, "waiting for ban");
>> +        for (i = 0; i < n_exec_queues; i++) {
>> +            while (!get_ban_property(fd, eci, vm, exec_queues[i]))
>> +                sched_yield();
>> +        }
>> +        igt_reset_timeout();
>> +    }
>> +
>>       if (!(flags & INVALID_FAULT) && !(flags & INVALID_VA)) {
>>           for (i = j; i < n_execs; i++)
>>                   igt_assert_eq(data[i].data, 0xc0ffee);
>


More information about the igt-dev mailing list