[PATCH i-g-t 5/5] tests/intel/xe_configfs: Add test to validate survivability mode

Thu Apr 24 04:17:33 UTC 2025

On Tue, Apr 22, 2025 at 03:29:55PM -0400, Rodrigo Vivi wrote:
>On Tue, Apr 22, 2025 at 08:57:35AM -0500, Lucas De Marchi wrote:
>> On Tue, Apr 22, 2025 at 03:26:01PM +0530, Riana Tauro wrote:
>> > The test validates if survivability mode is enabled on supported
>> > platforms when configured using configfs attribute.
>> >
>> > Signed-off-by: Riana Tauro <riana.tauro at intel.com>
>> > ---
>> > tests/intel/xe_configfs.c | 112 ++++++++++++++++++++++++++++++++++++++
>> > tests/meson.build         |   1 +
>> > 2 files changed, 113 insertions(+)
>> > create mode 100644 tests/intel/xe_configfs.c
>> >
>> > diff --git a/tests/intel/xe_configfs.c b/tests/intel/xe_configfs.c
>> > new file mode 100644
>> > index 000000000..414af4a86
>> > --- /dev/null
>> > +++ b/tests/intel/xe_configfs.c
>>
>>
>> humn... does it make sense to test survivability mode in a xe_configfs
>> test? configfs is just the way to trigger it. For completly different
>> areas of the driver I don't think we should bundle the tests into a
>> configfs test: we don't test if xe can be loaded without display in a
>> xe_param.c test, or if we can inject faults in a xe_debugfs.c test, etc.
>>
>> My suggestion is to have a dedicated test for survivability in which
>> configfs is part of it.
>
>Well, that would work for survivability itself. But perhaps it is good
>to have dedicated entry points for the knobs we expose, like we have
>a single place to toggle all sysfs and debufs. So we don't forget to
>add new cases and we have a single entry point to quickly exercises
>the knobs.

humn... dunno. The problem I see here is that the answer for "does it it
work?" is quite different for each configfs file we implement. Some may
even be honored on probe only vs others that can be set in runtime. If
we had a generic way to test the configfs like:

1) write XYZ to file
2) read file
3) make sure it's XYZ

then it'd make sense. But for these tests, checking that is not testing
much.

For survivability we should test:

	1) bind the module in survivability mode
	2) read something to make sure it is in that mode
	3) flash the same firmware... possible?
	4) unbind the module and remove configfs
	5) bind the module

For possible other things coming to configfs:

A) Extra Workarounds

	1) write a {gt/engine/lrc} regiter-save-restore
	2) bind the module
	3) check for each of them, via <debugfs>/register-save-restore that 
	   the value is correctly set.
	4) repeat test for write types like rmw, write, set bit, etc

B) Fuse off engines in software

	1) write a file with the possible possible engines that we should export
	2) bind the module
	3) check via debugfs that the exposed are at the most those

C) Do not attempt enabling display (i.e. a substitute to the module
param)

... etc

Are we going to shove all of them in a xe_configfs test even if the
tests are totally different? I think it will be harder to maintain, but
we can always move to something else later if it becomes overwhelming.
So.. I'm not sure. Any additional thoughts?

Lucas De Marchi

>
>>
>> Lucas De Marchi