[PATCH i-g-t 5/5] tests/intel/xe_configfs: Add test to validate survivability mode

Thu Apr 24 13:00:16 UTC 2025

On Wed, Apr 23, 2025 at 11:17:33PM -0500, Lucas De Marchi wrote:
> On Tue, Apr 22, 2025 at 03:29:55PM -0400, Rodrigo Vivi wrote:
> > On Tue, Apr 22, 2025 at 08:57:35AM -0500, Lucas De Marchi wrote:
> > > On Tue, Apr 22, 2025 at 03:26:01PM +0530, Riana Tauro wrote:
> > > > The test validates if survivability mode is enabled on supported
> > > > platforms when configured using configfs attribute.
> > > >
> > > > Signed-off-by: Riana Tauro <riana.tauro at intel.com>
> > > > ---
> > > > tests/intel/xe_configfs.c | 112 ++++++++++++++++++++++++++++++++++++++
> > > > tests/meson.build         |   1 +
> > > > 2 files changed, 113 insertions(+)
> > > > create mode 100644 tests/intel/xe_configfs.c
> > > >
> > > > diff --git a/tests/intel/xe_configfs.c b/tests/intel/xe_configfs.c
> > > > new file mode 100644
> > > > index 000000000..414af4a86
> > > > --- /dev/null
> > > > +++ b/tests/intel/xe_configfs.c
> > > 
> > > 
> > > humn... does it make sense to test survivability mode in a xe_configfs
> > > test? configfs is just the way to trigger it. For completly different
> > > areas of the driver I don't think we should bundle the tests into a
> > > configfs test: we don't test if xe can be loaded without display in a
> > > xe_param.c test, or if we can inject faults in a xe_debugfs.c test, etc.
> > > 
> > > My suggestion is to have a dedicated test for survivability in which
> > > configfs is part of it.
> > 
> > Well, that would work for survivability itself. But perhaps it is good
> > to have dedicated entry points for the knobs we expose, like we have
> > a single place to toggle all sysfs and debufs. So we don't forget to
> > add new cases and we have a single entry point to quickly exercises
> > the knobs.
> 
> humn... dunno. The problem I see here is that the answer for "does it it
> work?" is quite different for each configfs file we implement. Some may
> even be honored on probe only vs others that can be set in runtime. If
> we had a generic way to test the configfs like:
> 
> 1) write XYZ to file
> 2) read file
> 3) make sure it's XYZ
> 
> then it'd make sense. But for these tests, checking that is not testing
> much.
> 
> For survivability we should test:
> 
> 	1) bind the module in survivability mode
> 	2) read something to make sure it is in that mode
> 	3) flash the same firmware... possible?

I don't believe it is a good idea to make our validation here to depend
on many other external components. Our role is to put the device in
the survivability mode. So, validating we put the device on that mode
when requested it should be enough.

We could check if the sysfs file is in place and that drm/card is
not in place. Then it would be enough imho...

> 	4) unbind the module and remove configfs
> 	5) bind the module
> 
> For possible other things coming to configfs:
> 
> A) Extra Workarounds
> 
> 	1) write a {gt/engine/lrc} regiter-save-restore
> 	2) bind the module
> 	3) check for each of them, via <debugfs>/register-save-restore that 	   the
> value is correctly set.
> 	4) repeat test for write types like rmw, write, set bit, etc
> 
> B) Fuse off engines in software
> 
> 	1) write a file with the possible possible engines that we should export
> 	2) bind the module
> 	3) check via debugfs that the exposed are at the most those
> 
> C) Do not attempt enabling display (i.e. a substitute to the module
> param)
> 
> ... etc
> 
> Are we going to shove all of them in a xe_configfs test even if the
> tests are totally different? I think it will be harder to maintain, but
> we can always move to something else later if it becomes overwhelming.
> So.. I'm not sure. Any additional thoughts?
> 
> Lucas De Marchi
> 
> > 
> > > 
> > > Lucas De Marchi