[i-g-t 1/1] tests/intel/xe_exec_fault_mode: wait for the completion

Yang, Fei fei.yang at intel.com
Fri Oct 25 05:15:08 UTC 2024


> On Thu, Oct 24, 2024 at 05:23:22PM -0700, fei.yang at intel.com wrote:
>> From: Fei Yang <fei.yang at intel.com>
>>
>> The execution on GPU is out of order, the completion of the last
>> submission doesn't mean all the jobs are completed. We need to make
>> sure all the jobs are completed before moving on to unbinding the
>> buffer, otherwise the test would run into CAT errors.
>>
>> Signed-off-by: Fei Yang <fei.yang at intel.com>
>> ---
>>  tests/intel/xe_exec_fault_mode.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/intel/xe_exec_fault_mode.c
>> b/tests/intel/xe_exec_fault_mode.c
>> index 9cc51b7d3..d416c773b 100644
>> --- a/tests/intel/xe_exec_fault_mode.c
>> +++ b/tests/intel/xe_exec_fault_mode.c
>> @@ -305,7 +305,15 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>>              }
>>      }
>>      if (!(flags & INVALID_FAULT)) {
>> -            j = flags & INVALIDATE ? n_execs - 1 : 0;
>> +            /*
>> +             * For !RACE cases xe_wait_ufence has already been called in above
>> +             * for-loop, we should only wait for the completion of the last
>> +             * submission here. For RACE cases we need to wait for all submissions
>> +             * to complete because the GuC scheduling can be out of order, the
>> +             * completion of the last submission doesn't mean all submission are
>> +             * completed.
>> +             */
>> +            j = (flags & INVALIDATE && !(flags & RACE)) ? n_execs - 1 : 0;
>
> This makes sense and xe_exec_threads has similar code to this in function test_compute_mode() and given that is threaded test much more likely to hit this race.
>
> I checked xe_exec_compute_mode and I believe we need a similar patch for that test too. While you are here can you fixup that test too?

Sure, I will give it a shot. The code looks a bit different, and there is
a 1 second sleep after the submissions, I guess that's probably why we are
not seeing a problem yet, escpecially on real hardware where 1 second is
sufficient for all submissions to complete.

> This patch LGTM though, with that:
> Reviewed-by: Matthew Brost <matthew.brost at intel.com>

Could you help merge this patch too? I don't have merge right.

>>
>>              for (i = j; i < n_execs; i++) {
>>                      int64_t timeout = NSEC_PER_SEC;
>> --
>> 2.25.1
>>


More information about the igt-dev mailing list