[PATCH i-g-t] tests/intel/xe_exec_reset: Skip syncobj_wait during the gt_reset

Matthew Brost matthew.brost at intel.com
Thu Oct 3 16:08:38 UTC 2024


On Mon, Sep 30, 2024 at 04:18:49AM -0600, Bommu, Krishnaiah wrote:
> 
> 
> > -----Original Message-----
> > From: Auld, Matthew <matthew.auld at intel.com>
> > Sent: Friday, September 27, 2024 3:52 PM
> > To: Bernatowicz, Marcin <marcin.bernatowicz at linux.intel.com>; Bommu,
> > Krishnaiah <krishnaiah.bommu at intel.com>; igt-dev at lists.freedesktop.org
> > Cc: Summers, Stuart <stuart.summers at intel.com>; Brost, Matthew
> > <matthew.brost at intel.com>
> > Subject: Re: [PATCH i-g-t] tests/intel/xe_exec_reset: Skip syncobj_wait during
> > the gt_reset
> > 
> > On 27/09/2024 11:05, Bernatowicz, Marcin wrote:
> > >
> > >
> > > On 9/25/2024 12:31 PM, Bommu Krishnaiah wrote:
> > >> From: "Bommu Krishnaiah" <krishnaiah.bommu at intel.com>
> > >>
> > >> Skipping the syncobj_wait for the workloads which is submitted before
> > >> gt reset, since After gt reset There is no expectation from the
> > >> hardware/GuC/KMD that the workload will then re-execute and complete.
> > >>
> > >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu at intel.com>
> > >> Cc: Stuart Summers <stuart.summers at intel.com>
> > >> ---
> > >>   tests/intel/xe_exec_reset.c | 8 +++++---
> > >>   1 file changed, 5 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/tests/intel/xe_exec_reset.c
> > >> b/tests/intel/xe_exec_reset.c index b5d5f43ea..b1a7548c6 100644
> > >> --- a/tests/intel/xe_exec_reset.c
> > >> +++ b/tests/intel/xe_exec_reset.c
> > >> @@ -263,8 +263,9 @@ test_balancer(int fd, int gt, int class, int
> > >> n_exec_queues, int n_execs,
> > >>       }
> > >>       for (i = 0; i < n_exec_queues && n_execs; i++)
> > >> -        igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX, 0,
> > >> -                    NULL));
> > >> +        if (!(flags & GT_RESET))
> > >> +            igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX,
> > >
> > > What happens when the user waits on syncobj in case of GT reset ?
> > > Maybe there is no expectation that there will be re-execute, but
> > > shouldn't the syncobj be notified or a timeout hit ?
> > 
> > Yeah, this sounds like KMD bug. Expectation is that dma fences should
> > eventually signal no matter what, and in a reasonable amount of time.
> > 
> > Possibly relevant fix (very recently merged):
> > https://patchwork.freedesktop.org/patch/605681/?series=136463&rev=1

Matt Auld is correct here, this is a KMD bug not a test bug.

I thought the above patch would have fixed this problem.

> 
> I verified with this(https://patchwork.freedesktop.org/patch/605681/?series=136463&rev=1) patch, still I see the failure with this patch also.
> 

This is unfortunate. Seems like we still have some KMD issue here. I
just chatted with Himal about this and gave him a bit a direction.

If this persists and easy to reproduce perhaps I can jump in to take a
look in a few days if needed. Corner case submission issues are pretty
difficult to debug and happy to help if needed.

Matt 

> Regards,
> Krishna.
> 
> > 
> > >
> > >> +                        0, NULL));
> > >>       igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0,
> > >> NULL));
> > >>       sync[0].flags |= DRM_XE_SYNC_FLAG_SIGNAL; @@ -410,7 +411,8 @@
> > >> test_legacy_mode(int fd, struct drm_xe_engine_class_instance *eci,
> > >>       }
> > >>       for (i = 0; i < n_exec_queues && n_execs; i++)
> > >> -        igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX, 0,
> > >> +        if (!(flags & GT_RESET))
> > >> +            igt_assert(syncobj_wait(fd, &syncobjs[i], 1, INT64_MAX,
> > >> +0,
> > >>                       NULL));
> > >>       igt_assert(syncobj_wait(fd, &sync[0].handle, 1, INT64_MAX, 0,
> > >> NULL));
> > >


More information about the igt-dev mailing list