[PATCH V2 i-g-t] tests/xe_exec_threads: Make hang tests reset domain aware

Lucas De Marchi lucas.demarchi at intel.com
Tue Apr 2 20:55:30 UTC 2024


On Tue, Apr 02, 2024 at 12:40:17PM -0700, Matt Roper wrote:
>On Tue, Apr 02, 2024 at 05:52:23PM +0530, Tejas Upadhyay wrote:
>> RCS/CCS are dependent engines as they are sharing reset
>> domain. Whenever there is reset from CCS, all the exec queues
>> running on RCS are victimised mainly on Lunarlake.
>>
>> Lets skip parallel execution on CCS with RCS.
>
>I haven't really looked at this specific test in detail, but based on
>your explanation here, you're also going to run into problems with
>multiple CCS engines since they all share the same reset.  You won't see
>that on platforms like LNL that only have a single CCS, but platforms

but it is seen on LNL because of having both RCS and CCS.

>like PVC, ATS-M, DG2, etc. can all have multiple CCS where a reset on
>one kills anything running on the others.
>
>
>Matt
>
>>
>> It helps in fixing following errors:
>> 1. Test assertion failure function test_legacy_mode, file, Failed assertion: data[i].data == 0xc0ffee
>>
>> 2.Test assertion failure function xe_exec, file ../lib/xe/xe_ioctl.c, Failed assertion: __xe_exec(fd, exec) == 0, error: -125 != 0
>>
>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
>> ---
>>  tests/intel/xe_exec_threads.c | 26 +++++++++++++++++++++++++-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/intel/xe_exec_threads.c b/tests/intel/xe_exec_threads.c
>> index 8083980f9..31af61dc9 100644
>> --- a/tests/intel/xe_exec_threads.c
>> +++ b/tests/intel/xe_exec_threads.c
>> @@ -710,6 +710,17 @@ static void *thread(void *data)
>>  	return NULL;
>>  }
>>
>> +static bool is_engine_contexts_victimized(int fd, unsigned int flags)
>> +{
>> +	if (!IS_LUNARLAKE(intel_get_drm_devid(fd)))
>> +		return false;

as above, I don't think we should add any platform check here. It's
impossible to keep it up to date and it's also testing the wrong thing.
AFAIU you don't want parallel submission on engines that share the same
reset domain. So, this is actually what should be tested.

Lucas De Marchi

>> +
>> +	if (flags & HANG)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>  /**
>>   * SUBTEST: threads-%s
>>   * Description: Run threads %arg[1] test with multi threads
>> @@ -955,9 +966,13 @@ static void threads(int fd, int flags)
>>  	bool go = false;
>>  	int n_threads = 0;
>>  	int gt;
>> +	bool has_rcs = false;
>>
>> -	xe_for_each_engine(fd, hwe)
>> +	xe_for_each_engine(fd, hwe) {
>> +		if (hwe->engine_class == DRM_XE_ENGINE_CLASS_RENDER)
>> +			has_rcs = true;
>>  		++n_engines;
>> +	}
>>
>>  	if (flags & BALANCER) {
>>  		xe_for_each_gt(fd, gt)
>> @@ -990,6 +1005,15 @@ static void threads(int fd, int flags)
>>  	}
>>
>>  	xe_for_each_engine(fd, hwe) {
>> +		/* RCS/CCS sharing reset domain hence dependent engines.
>> +		 * When CCS is doing reset, all the contexts of RCS are
>> +		 * victimized, so skip the compute engine avoiding
>> +		 * parallel execution with RCS
>> +		 */
>> +		if (has_rcs && hwe->engine_class == DRM_XE_ENGINE_CLASS_COMPUTE &&
>> +		    is_engine_contexts_victimized(fd, flags))
>> +			continue;
>> +
>>  		threads_data[i].mutex = &mutex;
>>  		threads_data[i].cond = &cond;
>>  #define ADDRESS_SHIFT	39
>> --
>> 2.25.1
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation


More information about the Intel-xe mailing list