[PATCH i-g-t] tests/intel/xe_fault_injection: Ignore all errors while injecting fault

Mon Jun 2 21:31:07 UTC 2025

On Mon, Jun 02, 2025 at 01:31:40PM -0700, Daniele Ceraolo Spurio wrote:
>
>
>On 6/2/2025 11:30 AM, Cavitt, Jonathan wrote:
>>-----Original Message-----
>>From: Ceraolo Spurio, Daniele <daniele.ceraolospurio at intel.com>
>>Sent: Monday, June 2, 2025 11:26 AM
>>To: Wajdeczko, Michal <Michal.Wajdeczko at intel.com>; K V P, Satyanarayana <satyanarayana.k.v.p at intel.com>; igt-dev at lists.freedesktop.org; De Marchi, Lucas <lucas.demarchi at intel.com>
>>Cc: Dugast, Francois <francois.dugast at intel.com>; Cavitt, Jonathan <jonathan.cavitt at intel.com>; Harrison, John C <john.c.harrison at intel.com>
>>Subject: Re: [PATCH i-g-t] tests/intel/xe_fault_injection: Ignore all errors while injecting fault
>>>On 5/29/2025 1:29 PM, Michal Wajdeczko wrote:
>>>>On 29.05.2025 18:23, Daniele Ceraolo Spurio wrote:
>>>>>On 5/29/2025 6:31 AM, Satyanarayana K V P wrote:
>>>>>>Currently, numerous fault messages have been included in the dmesg
>>>>>>ignore list,
>>>>>>and this list continues to expand. Each time a new fault injection
>>>>>>point is
>>>>>>introduced or a new feature is activated, additional fault messages
>>>>>>appear,
>>>>>>making it cumbersome to manage the dmesg ignore list.
>>>>>>
>>>>>>This new patch automatically ignores all error messages from dmesg,
>>>>>>eliminating
>>>>>>the need to add or maintain a dmesg ignore message list.
>>>>>This would make the test almost meaningless. If the test finds an actual
>>>>>bug (i.e., an error we didn't expect), how would CI detect and report it
>>>>but how can you tell upfront, without actually running a test, which
>>>>error is expected and which is not?
>>>>
>>>>>if all errors are ignored? The only situations we would still fail on is
>>>>>when the kernel just dies.
>>>>and that perfectly fins, sine we should look only for BUG and WARNs, as
>>>>it's quite natural and expected that once we inject an error, the driver
>>>>will likely fail to load or proceed, and/or may report some error
>>>>messages, or even try to silently recover, *but* it shouldn't ever crash
>>>>
>>>>and that should be taken as a test goal, not that we look for specific
>>>>error messages that could be changed, omitted, replaced by the different
>>>>driver release or when running on different platform or function
>>>The patch does not look for WARNs though, it ignores all errors with a
>>>"*" filter, even WARNs. I'm still not fully convinced about ignoring
>>>anything, but I can understand the POV of ignoring just messages with
>>>the "ERROR" tag, as suggested in the other replies. I'd be happy with
>>>that kind of solution.
>>Huh?  I thought the alignment was that we were to ignore all messages
>>that *don't* have the ERROR tag, not the other way around?
>>-Jonathan Cavitt
>
>We are injecting an error, so some messages with the ERROR tag are 
>expected and should be ignored. The question is whether we should 
>ignore all of them or keep a list if expected ones and ignore just 
>those ones.
>
>We definitely can't ignore warnings or asserts, as those signal that 
>something is going very wrong.

Agreed. I think ignoring all **errors** would be ok: we don't want to
babysit the error messages and draw conclusions from them. It leads to a
lot of maintenance and even the odd situation of a typo fix in an error
message being flagged as regression.

The warnings we can't have already taint the kernel. So checking for
kernel taint should be good enough and I think we already do that,
don't we?

Lucas De Marchi

>
>Daniele
>
>>
>>>Daniele
>>>
>>>>Michal
>>>>
>>>>>Daniele
>>>>>
>>>
>