[PATCH i-g-t] tests/intel/xe_fault_injection: Suppress Guc CT dumps during fault injection
K V P, Satyanarayana
satyanarayana.k.v.p at intel.com
Thu Jun 12 07:52:02 UTC 2025
Hi.
> -----Original Message-----
> From: Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Sent: Tuesday, June 10, 2025 9:18 PM
> To: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; igt-
> dev at lists.freedesktop.org
> Cc: Harrison, John C <john.c.harrison at intel.com>; Dugast, Francois
> <francois.dugast at intel.com>; Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Subject: RE: [PATCH i-g-t] tests/intel/xe_fault_injection: Suppress Guc CT
> dumps during fault injection
>
> -----Original Message-----
> From: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>
> Sent: Tuesday, June 10, 2025 8:22 AM
> To: igt-dev at lists.freedesktop.org
> Cc: K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; Harrison, John C
> <john.c.harrison at intel.com>; Cavitt, Jonathan <jonathan.cavitt at intel.com>;
> Dugast, Francois <francois.dugast at intel.com>
> Subject: [PATCH i-g-t] tests/intel/xe_fault_injection: Suppress Guc CT dumps
> during fault injection
> >
> > When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
> > functions, the CI test systems are going out of space and crashing. To
> > avoid this issue, a new helper function is created and when fault is injected
> > into this xe_should_fail_ct_dead_capture() helper function, ct dead capture
> is
> > avoided which suppresses ct dumps in the log.
> >
> > Inject fault into xe_should_fail_ct_dead_capture() function along with
> > xe_guc_ct_send_recv() & xe_guc_mmio_send_recv() to suppress GUC ct
> dumps.
> >
> > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> > Suggested-by: John Harrison <John.C.Harrison at Intel.com>
> > Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > Cc: Francois Dugast <francois.dugast at intel.com>
> > ---
> > Same as https://patchwork.freedesktop.org/series/148416/ which was
> > reverted due to change from XE was still in review.
> >
> > Test-with: 20250524144613.11970-1-satyanarayana.k.v.p at intel.com
> > ---
> > tests/intel/xe_fault_injection.c | 28 ++++++++++++++++++++++++++++
> > 1 file changed, 28 insertions(+)
> >
> > diff --git a/tests/intel/xe_fault_injection.c b/tests/intel/xe_fault_injection.c
> > index 9fe6bfe35..bbc8e71d6 100644
> > --- a/tests/intel/xe_fault_injection.c
> > +++ b/tests/intel/xe_fault_injection.c
> > @@ -123,6 +123,22 @@ static void injection_list_add(const char
> function_name[])
> > close(dir);
> > }
> >
> > +static void injection_list_append(const char function_name[])
> > +{
> > + int dir, fd, ret;
> > +
> > + dir = fail_function_open();
> > + igt_assert_lte(0, dir);
> > +
> > + fd = openat(dir, "inject", O_WRONLY | O_APPEND);
> > + ret = write(fd, function_name, strlen(function_name));
>
> On one hand, we should probably guard against openat failing by
> implementing
> some error catching here, I.E.:
>
> """
> fd = openat(dir, "inject", O_WRONLY | O_APPEND);
> igt_assert_lte(0, fd);
> ret = write(fd, function_name, strlen(function_name));
> """
>
> On the other hand, if we fail to open the file with openat and attempt to pass
> the broken fd to write, then the write function would end up failing instead,
> so we'd end up catching the error later in this function either way. So, I
> suppose it's not strictly necessary to implement a guard here, though I think
> it would be preferred for stylistic reasons.
>
Fixed in new version.
> > +
> > + close(fd);
> > + close(dir);
> > +
> > + igt_assert_lte(0, ret);
> > +}
> > +
> > static void injection_list_remove(const char function_name[])
> > {
> > int dir;
> > @@ -202,6 +218,15 @@ static void set_retval(const char function_name[],
> long long retval)
> > close(dir);
> > }
> >
> > +static void ignore_fail_dump_in_dmesg(const char function_name[])
> > +{
> > + if (!strcmp(function_name, "xe_guc_ct_send_recv") ||
> > + !strcmp(function_name, "xe_guc_mmio_send_recv")) {
>
> I think this if statement can be shortened. Try:
>
> """
> if (!strstr(function_name, "send_recv")) {
> injection_list_append("xe_is_injection_active");
> set_retval("xe_is_injection_active", INJECT_ERRNO);
> }
> """
>
Fixed in new version.
> > + injection_list_append("xe_is_injection_active");
> > + set_retval("xe_is_injection_active", INJECT_ERRNO);
> > + }
> > +}
> > +
> > /**
> > * SUBTEST: inject-fault-probe-function-%s
> > * Description: inject an error in the injectable function %arg[1] then
> > @@ -236,6 +261,7 @@ inject_fault_probe(int fd, char pci_slot[], const char
> function_name[])
> >
> > ignore_faults_in_dmesg(function_name);
> > injection_list_add(function_name);
> > + ignore_fail_dump_in_dmesg(function_name);
> > set_retval(function_name, INJECT_ERRNO);
> >
> > igt_kmod_bind("xe", pci_slot);
> > @@ -273,6 +299,8 @@ static void probe_fail_guc(int fd, char pci_slot[],
> const char function_name[],
> > for (int i = iter_start; i < iter_end; i++) {
> > fault_params->space = i;
> > setup_injection_fault(fault_params);
> > + /* Clear the injection list */
> > + injection_list_clear();
>
> Hmm... I think it might be better to perform a targeted clear, instead of just
> wiping the list clean after each run. To do this, maybe we can have
> ignore_fail_dump_in_dmesg return some value. Perhaps:
>
> """
> static bool ignore_fail_dump_in_dmesg(const char function_name[])
> {
> return !strstr(function_name, "send_recv");
> }
> """
>
> Then, in inject_fault_probe:
>
> """
> static int
> inject_fault_probe(int fd, const char pci_slot[], const char function_name[])
> {
> int err = 0;
> bool ignore_fail_dump =
> ignore_fail_dump_in_dmesg(function_name);
> igt_info("Injecting error \"%s\" (%d) in function \"%s\"\n",
> strerror(-INJECT_ERRNO), INJECT_ERRNO, function_name);
>
> ignore_dmesg_errors_from_dut(pci_slot);
> injection_list_add(function_name);
> if (ignore_fail_dump) {
> injection_list_append("xe_is_injection_active");
> set_retval("xe_is_injection_active", INJECT_ERRNO);
> }
>
> set_retval(function_name, INJECT_ERRNO);
>
> igt_kmod_bind("xe", pci_slot);
>
> err = -errno;
> injection_list_remove(function_name);
> if (ignore_fail_dump)
> injection_list_remove("xe_is_injection_active");
>
> return err;
> }
> """
>
> We could alternatively keep the injection_list_append and set_retval as a part
> of
> ignore_fail_dump_in_dmesg, and just check the return value at the end of the
> inject_fault_probe function when performing the injection_list_remove. Or
> we
> can just keep it the way it is currently. I'll leave that to your discretion.
> -Jonathan Cavitt
Fixed in new version.
-Satya.
>
> > inject_fault_probe(fd, pci_slot, function_name);
> > igt_kmod_unbind("xe", pci_slot);
> > }
> > --
> > 2.43.0
> >
> >
More information about the igt-dev
mailing list