[i-g-t 1/1] tests/intel/xe_exec_fault_mode: wait for the completion

Yang, Fei fei.yang at intel.com
Fri Oct 25 19:58:32 UTC 2024


> On Thu, Oct 24, 2024 at 11:15:08PM -0600, Yang, Fei wrote:
>>> On Thu, Oct 24, 2024 at 05:23:22PM -0700, fei.yang at intel.com wrote:
>>>> From: Fei Yang <fei.yang at intel.com>
>>>>
>>>> The execution on GPU is out of order, the completion of the last
>>>> submission doesn't mean all the jobs are completed. We need to make
>>>> sure all the jobs are completed before moving on to unbinding the
>>>> buffer, otherwise the test would run into CAT errors.
>>>>
>>>> Signed-off-by: Fei Yang <fei.yang at intel.com>
>>>> ---
>>>>  tests/intel/xe_exec_fault_mode.c | 10 +++++++++-
>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tests/intel/xe_exec_fault_mode.c
>>>> b/tests/intel/xe_exec_fault_mode.c
>>>> index 9cc51b7d3..d416c773b 100644
>>>> --- a/tests/intel/xe_exec_fault_mode.c
>>>> +++ b/tests/intel/xe_exec_fault_mode.c
>>>> @@ -305,7 +305,15 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
>>>>              }
>>>>      }
>>>>      if (!(flags & INVALID_FAULT)) {
>>>> -            j = flags & INVALIDATE ? n_execs - 1 : 0;
>>>> +            /*
>>>> +             * For !RACE cases xe_wait_ufence has already been called in above
>>>> +             * for-loop, we should only wait for the completion of the last
>>>> +             * submission here. For RACE cases we need to wait for all submissions
>>>> +             * to complete because the GuC scheduling can be out of order, the
>>>> +             * completion of the last submission doesn't mean all submission are
>>>> +             * completed.
>>>> +             */
>>>> +            j = (flags & INVALIDATE && !(flags & RACE)) ? n_execs
>>>> + - 1 : 0;
>>>
>>> This makes sense and xe_exec_threads has similar code to this in function test_compute_mode() and given that is threaded test much more likely to hit this race.
>>>
>>> I checked xe_exec_compute_mode and I believe we need a similar patch for that test too. While you are here can you fixup that test too?
>>
>> Sure, I will give it a shot. The code looks a bit different, and there
>> is a 1 second sleep after the submissions, I guess that's probably why
>> we are not seeing a problem yet, escpecially on real hardware where 1
>> second is sufficient for all submissions to complete.
>
> Probably that is a bit hacky - a lot of these tests have hacks from early Xe development that never got bother to review. I guess if it is working, low priority but removing the sleep and replacing with code like this would be good I think,

I just sent a patch for that, but didn't really run much test, I will rely on CI results, will let you know if it's ready for review.

>>
>>> This patch LGTM though, with that:
>>> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
>>
>> Could you help merge this patch too? I don't have merge right.
>>
>
> Pushed.
>
> Matt
>
>>>>
>>>>              for (i = j; i < n_execs; i++) {
>>>>                      int64_t timeout = NSEC_PER_SEC;
>>>> --
>>>> 2.25.1
>>>>


More information about the igt-dev mailing list