[PATCH 0/2] Add configfs support for survivability mode
Riana Tauro
riana.tauro at intel.com
Tue Apr 1 05:55:52 UTC 2025
Hi Rodrigo
On 4/1/2025 1:49 AM, Rodrigo Vivi wrote:
> On Thu, Mar 27, 2025 at 09:40:39AM -0500, Lucas De Marchi wrote:
>> On Thu, Mar 27, 2025 at 12:12:00PM +0530, Riana Tauro wrote:
>>> This series proposes to expose attributes via xe configfs
>>> subsystem. Xe registers a configfs subsystem named 'xe'.
>>> Userspace can then create directories for the devices they
>>> want to configure and set appropriate attributes
>>>
>>> This is done by
>>>
>>> mount -t configfs none /config
>>> mkdir /config/xe/0000:03:00.0
>>>
>>
>> If we need a new version or to document anywhere in our docs, I'd add a
>> comment here:
>>
>> # If driver is already bound, unbind it as this configuration
>> # applies only when probing it
>>
>>> echo 0000:03:00.0 > /sys/bus/pci/drivers/xe/unbind
>>> echo 1 > sys/kernel/config/xe/0000:03:00.0/survivability_mode
>>> echo 0000:03:00.0 > /sys/bus/pci/drivers/xe/bind
>>>
>>> This is an alternative to introducing module param that causes
>>> all the connected and supported GPU cards to enter survivability mode.
>>> Manually entering survivability mode is useful when pcode does not
>>> report failure, in field repairs and validation
>>>
>>> Rev2: use config_groups (Lucas)
>>
>> Awesome. I have some other work pending that will make use of
>> it. I will play with these patches soon.
>
> I really liked this new flow and I was giving it a try here right now.
>
> However it didn't work. It didn't take me to the survivability mode,
This works only on BMG and after writing 1 to survivability mode> but
also, I cannot unload the xe after creating this configfs file:only
rmdir is supported to remove configfs . rm didn't work>
> sudo remove /sys/kernel/config/xe/0000\:0*
> rm: cannot remove '0000:00:02.0/survivability_mode': Operation not permitted
> rm: cannot remove '0000:03:00.0/survivability_mode': Operation not permitted>
> Tried to unbind and had the same failure.
unbind worked for me. rmmod without removing directories does not work
as configfs takes reference count of module.
But shouldn't user be responsible to rmdir once created before unloading
module. I haven't tried without the owner attribute but
according to code the registering subsystem might fail. Will try
Thanks
Riana>
> then with the configfs there we cannot remove the module:
> $ sudo rmmod xe
> rmmod: ERROR: Module xe is in use
>
>
> So, it looks we have some stuff to adjust here before we can move further,
> but so far things are looking promising indeed
>
>>
>> thanks
>> Lucas De Marchi
More information about the Intel-xe
mailing list