[PATCH v3] tests/intel/xe_fault_injection: Ignore all errors while injecting fault
Cavitt, Jonathan
jonathan.cavitt at intel.com
Wed Jun 4 20:06:21 UTC 2025
-----Original Message-----
From: Cavitt, Jonathan <jonathan.cavitt at intel.com>
Sent: Wednesday, June 4, 2025 10:06 AM
To: igt-dev at lists.freedesktop.org
Cc: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Wajdeczko, Michal <Michal.Wajdeczko at intel.com>; Ceraolo Spurio, Daniele <daniele.ceraolospurio at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Dugast, Francois <francois.dugast at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>; Harrison, John C <john.c.harrison at intel.com>; kamil.konieczny at linux.intel.com
Subject: [PATCH v3] tests/intel/xe_fault_injection: Ignore all errors while injecting fault
>
> From: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
>
> Currently, numerous fault messages have been included in the dmesg
> ignore list, and this list continues to expand. Each time a new fault
> injection point is introduced or a new feature is activated, additional
> fault messages appear, making it cumbersome to manage the dmesg ignore
> list.
>
> However, we can safely assert that all dmesg reports that contain
> *ERROR* in their message can be ignored, so add them to the dmesg ignore
> list. This unfortunately does not include the device probe error
> itself, so that must be added separately.
You know, I just thought of something...
Aren't we specifically injecting ENOMEM as a part of these tests?
If we get anything other than ENOMEM as the errno return value, then
that should be unexpected. So, are we certain that we can safely ignore
all error-level dmesg reports here?
-Jonathan Cavitt
>
> While we're here, we should also assert that any errors we see are only
> coming from the target PCI device.
>
> v2:
> - Only ignore error-level dmesg reports (or, at least, reports with
> *ERROR* in them), and device probe failues
> - Add PCI data to regex (Michal)
>
> v3: (Michal)
> - Revert name change
> - Add change log
> - Remove fixes tag from commit
> - Rename ignore_faults_in_dmesg to igt_ignore_dmesg_errors_from_dut, and
> move to lib/igt_core.c
> - Minor code fixes
>
> v4:
> - Return ignore_faults_in_dmesg to tests/intel/xe_fault_injection.c, but
> keep it renamed to ignore_dmesg_errors_from_dut (Kamil)
>
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> Suggested-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Suggested-by: Lucas De Marchi <lucas.demarchi at intel.com>
> Cc: Francois Dugast <francois.dugast at intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: John Harrison <john.c.harrison at intel.com>
> Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
> ---
> tests/intel/xe_fault_injection.c | 39 ++++++++++++--------------------
> 1 file changed, 15 insertions(+), 24 deletions(-)
>
> diff --git a/tests/intel/xe_fault_injection.c b/tests/intel/xe_fault_injection.c
> index 9fe6bfe351..14aaeebf5e 100644
> --- a/tests/intel/xe_fault_injection.c
> +++ b/tests/intel/xe_fault_injection.c
> @@ -64,28 +64,19 @@ static int fail_function_open(void)
> return debugfs_fail_function_dir_fd;
> }
>
> -static bool function_is_part_of_guc(const char function_name[])
> +static void ignore_dmesg_errors_from_dut(int fd)
> {
> - return strstr(function_name, "_guc_") != NULL ||
> - strstr(function_name, "_uc_") != NULL ||
> - strstr(function_name, "_wopcm_") != NULL;
> -}
> -
> -static void ignore_faults_in_dmesg(const char function_name[])
> -{
> - /* Driver probe is expected to fail in all cases, so ignore in igt_runner */
> - char regex[1024] = "probe with driver xe failed with error -12";
> -
> /*
> - * If GuC module fault is injected, GuC is expected to fail,
> - * so also ignore GuC init failures in igt_runner.
> + * Driver probe is expected to fail in all cases, so ignore in igt_runner.
> + * Additionally, all error-level reports are expected, so ignore those as well.
> */
> - if (function_is_part_of_guc(function_name)) {
> - strcat(regex, "|GT[0-9a-fA-F]*: GuC init failed with -ENOMEM");
> - strcat(regex, "|GT[0-9a-fA-F]*: Failed to initialize uC .-ENOMEM");
> - strcat(regex, "|GT[0-9a-fA-F]*: Failed to enable GuC CT .-ENOMEM");
> - strcat(regex, "|GT[0-9a-fA-F]*: GuC PC query task state failed: -ENOMEM");
> - }
> + static const char *store = "probe with driver xe failed with error|\\*ERROR\\*";
> + char pci_slot[NAME_MAX];
> + char regex[1024];
> +
> + /* Only block dmesg reports that target the pci slot of the given fd */
> + igt_device_get_pci_slot_name(fd, pci_slot);
> + snprintf(regex, sizeof(regex), "%s:.*(%s)", pci_slot, store);
>
> igt_emit_ignore_dmesg_regex(regex);
> }
> @@ -234,7 +225,7 @@ inject_fault_probe(int fd, char pci_slot[], const char function_name[])
> igt_info("Injecting error \"%s\" (%d) in function \"%s\"\n",
> strerror(-INJECT_ERRNO), INJECT_ERRNO, function_name);
>
> - ignore_faults_in_dmesg(function_name);
> + ignore_dmesg_errors_from_dut(fd);
> injection_list_add(function_name);
> set_retval(function_name, INJECT_ERRNO);
>
> @@ -299,7 +290,7 @@ exec_queue_create_fail(int fd, struct drm_xe_engine_class_instance *instance,
> igt_assert_eq(__xe_exec_queue_create(fd, vm, 1, 1, instance, 0, &exec_queue_id), 0);
> xe_exec_queue_destroy(fd, exec_queue_id);
>
> - ignore_faults_in_dmesg(function_name);
> + ignore_dmesg_errors_from_dut(fd);
> injection_list_add(function_name);
> set_retval(function_name, INJECT_ERRNO);
> igt_assert(__xe_exec_queue_create(fd, vm, 1, 1, instance, 0, &exec_queue_id) != 0);
> @@ -334,7 +325,7 @@ vm_create_fail(int fd, const char function_name[], unsigned int flags)
> {
> igt_assert_eq(simple_vm_create(fd, flags), 0);
>
> - ignore_faults_in_dmesg(function_name);
> + ignore_dmesg_errors_from_dut(fd);
> injection_list_add(function_name);
> set_retval(function_name, INJECT_ERRNO);
> igt_assert(simple_vm_create(fd, flags) != 0);
> @@ -397,7 +388,7 @@ vm_bind_fail(int fd, const char function_name[])
>
> igt_assert_eq(simple_vm_bind(fd, vm), 0);
>
> - ignore_faults_in_dmesg(function_name);
> + ignore_dmesg_errors_from_dut(fd);
> injection_list_add(function_name);
> set_retval(function_name, INJECT_ERRNO);
> igt_assert(simple_vm_bind(fd, vm) != 0);
> @@ -445,7 +436,7 @@ oa_add_config_fail(int fd, int sysfs, int devid, const char function_name[])
> igt_assert(igt_sysfs_scanf(sysfs, path, "%" PRIu64, &config_id) == 1);
> igt_assert_eq(intel_xe_perf_ioctl(fd, DRM_XE_OBSERVATION_OP_REMOVE_CONFIG, &config_id), 0);
>
> - ignore_faults_in_dmesg(function_name);
> + ignore_dmesg_errors_from_dut(fd);
> injection_list_add(function_name);
> set_retval(function_name, INJECT_ERRNO);
> igt_assert_lt(intel_xe_perf_ioctl(fd, DRM_XE_OBSERVATION_OP_ADD_CONFIG, &config), 0);
> --
> 2.43.0
>
>
More information about the igt-dev
mailing list