[PATCH] tests/intel/xe_fault_injection: Ignore *ERROR* dmesg reports

Cavitt, Jonathan jonathan.cavitt at intel.com
Tue Jun 3 18:40:14 UTC 2025


-----Original Message-----
From: Cavitt, Jonathan 
Sent: Tuesday, June 3, 2025 11:31 AM
To: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>; igt-dev at lists.freedesktop.org
Cc: K V P, Satyanarayana <Satyanarayana.K.V.P at intel.com>; Ceraolo Spurio, Daniele <daniele.ceraolospurio at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dugast, Francois <Francois.Dugast at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>; Harrison, John C <john.c.harrison at intel.com>
Subject: RE: [PATCH] tests/intel/xe_fault_injection: Ignore *ERROR* dmesg reports
> 
> -----Original Message-----
> From: Wajdeczko, Michal <Michal.Wajdeczko at intel.com> 
> Sent: Tuesday, June 3, 2025 10:28 AM
> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; igt-dev at lists.freedesktop.org
> Cc: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Ceraolo Spurio, Daniele <daniele.ceraolospurio at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dugast, Francois <francois.dugast at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>; Harrison, John C <john.c.harrison at intel.com>
> Subject: Re: [PATCH] tests/intel/xe_fault_injection: Ignore *ERROR* dmesg reports
> > 
> > Hi,
> > 
> > On 03.06.2025 18:18, Jonathan Cavitt wrote:
> > > From: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> > 
> > since this looks like a follow-up rev of the original patch, there was
> > no need to change subject and there should be a change log at the end
> > 
> > > 
> > > Currently, numerous fault messages have been included in the dmesg
> > > ignore list, and this list continues to expand.  Each time a new fault
> > > injection point is introduced or a new feature is activated, additional
> > > fault messages appear, making it cumbersome to manage the dmesg ignore
> > > list.
> > > 
> > > However, we can safely assert that all dmesg reports that contain
> > > *ERROR* in their message can be ignored, so add them to the dmesg ignore
> > > list.  This unfortunately does not include the device probe error
> > > itself, so that must be added separately.
> > > 
> > > While we're here, we should also assert that any errors we see are only
> > > coming from the target PCI device.
> > > 
> > > Fixes: 5dcf915415ee ("tests/intel/xe_fault_injection: Ignore expected errors")
> > 
> > I'm not sure it needs Fixes: tag as it wasn't badly broken
> > 
> > > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > > Suggested-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > > Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> > > Suggested-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > Cc: Francois Dugast <francois.dugast at intel.com>
> > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > Cc: John Harrison <john.c.harrison at intel.com>
> > > 
> > > ---
> > >  tests/intel/xe_fault_injection.c | 39 ++++++++++++--------------------
> > >  1 file changed, 15 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/tests/intel/xe_fault_injection.c b/tests/intel/xe_fault_injection.c
> > > index 9fe6bfe351..a4a08af5f7 100644
> > > --- a/tests/intel/xe_fault_injection.c
> > > +++ b/tests/intel/xe_fault_injection.c
> > > @@ -64,28 +64,19 @@ static int fail_function_open(void)
> > >  	return debugfs_fail_function_dir_fd;
> > >  }
> > >  
> > > -static bool function_is_part_of_guc(const char function_name[])
> > > +static void ignore_faults_in_dmesg(int fd)
> > 
> > nit: this could be
> > 
> > 	ignore_dmesg_errors_from_dut(fd)
> > 
> > and since it's quite generic maybe it could moved to lib/ near
> > 
> > 	igt_emit_ignore_dmesg_regex(regex)
> > as
> > 	igt_ignore_dmesg_errors_from_dut(fd)
> 
> I can update the function name, but this function is only being used here, so
> I don't see a reason to move it to the lib folder.

It just came to mind that, in addition to blocking all error-level reports from dmesg
(or, at least, the reports that contain the "*ERROR*" substring), this function also
blocks any device probe errors.  While I can imagine a general case for blocking all
error-level dmesg reports, I don't think we'd additionally want to block any device
probe errors in those cases.
-Jonathan Cavitt

> 
> > 
> > 
> > >  {
> > > -	return strstr(function_name, "_guc_") != NULL ||
> > > -	       strstr(function_name, "_uc_") != NULL ||
> > > -	       strstr(function_name, "_wopcm_") != NULL;
> > > -}
> > > -
> > > -static void ignore_faults_in_dmesg(const char function_name[])
> > > -{
> > > -	/* Driver probe is expected to fail in all cases, so ignore in igt_runner */
> > > -	char regex[1024] = "probe with driver xe failed with error -12";
> > > -
> > >  	/*
> > > -	 * If GuC module fault is injected, GuC is expected to fail,
> > > -	 * so also ignore GuC init failures in igt_runner.
> > > +	 * Driver probe is expected to fail in all cases, so ignore in igt_runner.
> > > +	 * Additionally, all error-level reports are expected, so ignore those as well.
> > >  	 */
> > > -	if (function_is_part_of_guc(function_name)) {
> > > -		strcat(regex, "|GT[0-9a-fA-F]*: GuC init failed with -ENOMEM");
> > > -		strcat(regex, "|GT[0-9a-fA-F]*: Failed to initialize uC .-ENOMEM");
> > > -		strcat(regex, "|GT[0-9a-fA-F]*: Failed to enable GuC CT .-ENOMEM");
> > > -		strcat(regex, "|GT[0-9a-fA-F]*: GuC PC query task state failed: -ENOMEM");
> > > -	}
> > > +	char store[1024] = "probe with driver xe failed with error|\\*ERROR\\*";
> > 
> > 	static const char store[] = ..
> > 
> > > +	char pci_slot[NAME_MAX];
> > > +	char regex[1024];
> > > +
> > > +	/* All dmesg reports should only target the pci slot of the given fd */
> > > +	igt_device_get_pci_slot_name(fd, pci_slot);
> > > +	snprintf(regex, 1024, "%s:.*(%s)", pci_slot, store);
> > 
> > 	snprintf(regex, sizeof(regex), ...
> > 
> > >  
> > >  	igt_emit_ignore_dmesg_regex(regex);
> > >  }
> > > @@ -234,7 +225,7 @@ inject_fault_probe(int fd, char pci_slot[], const char function_name[])
> > >  	igt_info("Injecting error \"%s\" (%d) in function \"%s\"\n",
> > >  		 strerror(-INJECT_ERRNO), INJECT_ERRNO, function_name);
> > >  
> > > -	ignore_faults_in_dmesg(function_name);
> > > +	ignore_faults_in_dmesg(fd);
> > 
> > not sure how this filtering really works, but maybe it will be
> > sufficient to call that once in the fixup ?
> 
> The dmesg ignore list is cleared after each test, so we need to reinitialize it before
> every test run.
> - Jonathan Cavitt
> 
> > 
> > >  	injection_list_add(function_name);
> > >  	set_retval(function_name, INJECT_ERRNO);
> > >  
> > > @@ -299,7 +290,7 @@ exec_queue_create_fail(int fd, struct drm_xe_engine_class_instance *instance,
> > >  	igt_assert_eq(__xe_exec_queue_create(fd, vm, 1, 1, instance, 0, &exec_queue_id), 0);
> > >  	xe_exec_queue_destroy(fd, exec_queue_id);
> > >  
> > > -	ignore_faults_in_dmesg(function_name);
> > > +	ignore_faults_in_dmesg(fd);
> > >  	injection_list_add(function_name);
> > >  	set_retval(function_name, INJECT_ERRNO);
> > >  	igt_assert(__xe_exec_queue_create(fd, vm, 1, 1, instance, 0, &exec_queue_id) != 0);
> > > @@ -334,7 +325,7 @@ vm_create_fail(int fd, const char function_name[], unsigned int flags)
> > >  {
> > >  	igt_assert_eq(simple_vm_create(fd, flags), 0);
> > >  
> > > -	ignore_faults_in_dmesg(function_name);
> > > +	ignore_faults_in_dmesg(fd);
> > >  	injection_list_add(function_name);
> > >  	set_retval(function_name, INJECT_ERRNO);
> > >  	igt_assert(simple_vm_create(fd, flags) != 0);
> > > @@ -397,7 +388,7 @@ vm_bind_fail(int fd, const char function_name[])
> > >  
> > >  	igt_assert_eq(simple_vm_bind(fd, vm), 0);
> > >  
> > > -	ignore_faults_in_dmesg(function_name);
> > > +	ignore_faults_in_dmesg(fd);
> > >  	injection_list_add(function_name);
> > >  	set_retval(function_name, INJECT_ERRNO);
> > >  	igt_assert(simple_vm_bind(fd, vm) != 0);
> > > @@ -445,7 +436,7 @@ oa_add_config_fail(int fd, int sysfs, int devid, const char function_name[])
> > >  	igt_assert(igt_sysfs_scanf(sysfs, path, "%" PRIu64, &config_id) == 1);
> > >  	igt_assert_eq(intel_xe_perf_ioctl(fd, DRM_XE_OBSERVATION_OP_REMOVE_CONFIG, &config_id), 0);
> > >  
> > > -	ignore_faults_in_dmesg(function_name);
> > > +	ignore_faults_in_dmesg(fd);
> > >  	injection_list_add(function_name);
> > >  	set_retval(function_name, INJECT_ERRNO);
> > >  	igt_assert_lt(intel_xe_perf_ioctl(fd, DRM_XE_OBSERVATION_OP_ADD_CONFIG, &config), 0);
> > 
> > 
> 


More information about the igt-dev mailing list