[i-g-t 1/1] tests/intel/xe_exec_fault_mode: wait for the completion

Matthew Brost matthew.brost at intel.com
Fri Oct 25 19:04:57 UTC 2024


On Thu, Oct 24, 2024 at 11:15:08PM -0600, Yang, Fei wrote:
> > On Thu, Oct 24, 2024 at 05:23:22PM -0700, fei.yang at intel.com wrote:
> >> From: Fei Yang <fei.yang at intel.com>
> >>
> >> The execution on GPU is out of order, the completion of the last
> >> submission doesn't mean all the jobs are completed. We need to make
> >> sure all the jobs are completed before moving on to unbinding the
> >> buffer, otherwise the test would run into CAT errors.
> >>
> >> Signed-off-by: Fei Yang <fei.yang at intel.com>
> >> ---
> >>  tests/intel/xe_exec_fault_mode.c | 10 +++++++++-
> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tests/intel/xe_exec_fault_mode.c
> >> b/tests/intel/xe_exec_fault_mode.c
> >> index 9cc51b7d3..d416c773b 100644
> >> --- a/tests/intel/xe_exec_fault_mode.c
> >> +++ b/tests/intel/xe_exec_fault_mode.c
> >> @@ -305,7 +305,15 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
> >>              }
> >>      }
> >>      if (!(flags & INVALID_FAULT)) {
> >> -            j = flags & INVALIDATE ? n_execs - 1 : 0;
> >> +            /*
> >> +             * For !RACE cases xe_wait_ufence has already been called in above
> >> +             * for-loop, we should only wait for the completion of the last
> >> +             * submission here. For RACE cases we need to wait for all submissions
> >> +             * to complete because the GuC scheduling can be out of order, the
> >> +             * completion of the last submission doesn't mean all submission are
> >> +             * completed.
> >> +             */
> >> +            j = (flags & INVALIDATE && !(flags & RACE)) ? n_execs - 1 : 0;
> >
> > This makes sense and xe_exec_threads has similar code to this in function test_compute_mode() and given that is threaded test much more likely to hit this race.
> >
> > I checked xe_exec_compute_mode and I believe we need a similar patch for that test too. While you are here can you fixup that test too?
> 
> Sure, I will give it a shot. The code looks a bit different, and there is
> a 1 second sleep after the submissions, I guess that's probably why we are
> not seeing a problem yet, escpecially on real hardware where 1 second is
> sufficient for all submissions to complete.

Probably that is a bit hacky - a lot of these tests have hacks from
early Xe development that never got bother to review. I guess if it is
working, low priority but removing the sleep and replacing with code
like this would be good I think, 

> 
> > This patch LGTM though, with that:
> > Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> 
> Could you help merge this patch too? I don't have merge right.
> 

Pushed.

Matt

> >>
> >>              for (i = j; i < n_execs; i++) {
> >>                      int64_t timeout = NSEC_PER_SEC;
> >> --
> >> 2.25.1
> >>


More information about the igt-dev mailing list