[PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware

Upadhyay, Tejas tejas.upadhyay at intel.com
Wed Apr 3 05:35:40 UTC 2024



> -----Original Message-----
> From: De Marchi, Lucas <lucas.demarchi at intel.com>
> Sent: Wednesday, April 3, 2024 2:26 AM
> To: Roper, Matthew D <matthew.d.roper at intel.com>
> Cc: Upadhyay, Tejas <tejas.upadhyay at intel.com>; igt-
> dev at lists.freedesktop.org; intel-xe at lists.freedesktop.org; Brost, Matthew
> <matthew.brost at intel.com>
> Subject: Re: [PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset
> domain aware
> 
> On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
> >On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
> >> RCS/CCS are dependent engines as they are sharing reset domain.
> >> Whenever there is reset from CCS, all the exec queues running on RCS
> >> are victimised mainly on Lunarlake.
> >>
> >> Lets skip parallel execution on CCS with RCS.
> >
> >I haven't really looked at this specific test in detail, but based on
> >your explanation here, you're also going to run into problems with
> >multiple CCS engines since they all share the same reset.  You won't
> >see that on platforms like LNL that only have a single CCS, but
> >platforms
> 
> but it is seen on LNL because of having both RCS and CCS.
> 
> >like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset on
> >one kills anything running on the others.
> >
> >
> >Matt
> >
> >>
> >> It helps in fixing following errors:
> >> 1. Test assertion failure function test_legacy_mode, file, Failed
> >> assertion: data[i].data == 0xc0ffee
> >>
> >> 2.Test assertion failure function xe_exec, file ../lib/xe/xe_ioctl.c,
> >> Failed assertion: __xe_exec(fd, exec) == 0, error: -125 != 0
> >>
> >> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> >> ---
> >>  tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
> >>  1 file changed, 25 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tests/intel/xe_exec_threads.c
> >> b/tests/intel/xe_exec_threads.c index 8083980f9..31af61dc9 100644
> >> --- a/tests/intel/xe_exec_threads.c
> >> +++ b/tests/intel/xe_exec_threads.c
> >> @@ -710,6 +710,17 @@ static void *thread(void *data)
> >>  	return NULL;
> >>  }
> >>
> >> +static bool is_engine_contexts_victimized(int fd, unsigned int
> >> +flags) {
> >> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
> >> +		return false;
> 
> as above, I don't think we should add any platform check here. It's impossible
> to keep it up to date and it's also testing the wrong thing.
> AFAIU you don't want parallel submission on engines that share the same
> reset domain. So, this is actually what should be tested.

Platforms like  PVC, ATS-M, DG2, etc. have some kind of WA/noWA which helps to run things parallelly on engines in same reset domain and apparently BMG/LNL does not have that kind of support so applicable for LNL/BMG with parallel submission on RCS/CCS only.

@Harrison, John C please reply if you have any other input here.

Thanks,
Tejas 
> 
> Lucas De Marchi
> 
> >> +
> >> +	if (flags & HANG)
> >> +		return true;
> >> +
> >> +	return false;
> >> +}
> >> +
> >>  /**
> >>   * SUBTEST: threads-%s
> >>   * Description: Run threads %arg[1] test with multi threads @@
> >> -955,9 +966,13 @@ static void threads(int fd, int flags)
> >>  	bool go = false;
> >>  	int n_threads = 0;
> >>  	int gt;
> >> +	bool has_rcs = false;
> >>
> >> -	xe_for_each_engine(fd, hwe)
> >> +	xe_for_each_engine(fd, hwe) {
> >> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
> >> +			has_rcs = true;
> >>  		++n_engines;
> >> +	}
> >>
> >>  	if (flags & BALANCER) {
> >>  		xe_for_each_gt(fd, gt)
> >> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
> >>  	}
> >>
> >>  	xe_for_each_engine(fd, hwe) {
> >> +		/* RCS/CCS sharing reset domain hence dependent engines.
> >> +		 * When CCS is doing reset, all the contexts of RCS are
> >> +		 * victimized, so skip the compute engine avoiding
> >> +		 * parallel execution with RCS
> >> +		 */
> >> +		if (has_rcs && hwe->engine_class ==
> DRM_XE_ENGINE_CLASS_COMPUTE &&
> >> +		    is_engine_contexts_victimized(fd, flags))
> >> +			continue;
> >> +
> >>  		threads_data[i].mutex = &mutex;
> >>  		threads_data[i].cond = &cond;
> >>  #define ADDRESS_SHIFT	39
> >> --
> >> 2.25.1
> >>
> >
> >--
> >Matt Roper
> >Graphics Software Engineer
> >Linux GPU Platform Enablement
> >Intel Corporation


More information about the igt-dev mailing list