[PATCH v4 6/9] drm/xe/doc: Document device wedged and runtime survivability

Riana Tauro riana.tauro at intel.com
Mon Jul 14 09:04:31 UTC 2025



On 7/12/2025 11:15 AM, Raag Jadav wrote:
> On Fri, Jul 11, 2025 at 11:39:22AM +0530, Riana Tauro wrote:
>> On 7/11/2025 11:09 AM, Raag Jadav wrote:
>>> On Wed, Jul 09, 2025 at 04:50:18PM +0530, Riana Tauro wrote:
>>>> Add documentation for vendor specific device wedged recovery method
>>>> and runtime survivability.
>>>
>>> ...
>>>
>>>> + * Runtime Survivability
>>>> + * =====================
>>>> + *
>>>> + * Certain runtime firmware errors can cause the device to enter a wedged state
>>>> + * (:ref:`xe-device-wedging`) requiring a firmware flash to restore normal operation.
>>>> + * Runtime Survivability Mode indicates that a firmware flash is necessary to recover the device and
>>>> + * is indicated by the presence of survivability mode sysfs::
>>>> + *
>>>> + *	/sys/bus/pci/devices/<device>/survivability_mode
>>>> + *
>>>> + * Survivability mode sysfs provides information about the type of survivability mode.
>>>> + *
>>>> + * When such errors occur, userspace is notified with the drm device wedged uevent and runtime
>>>> + * survivability mode. User can then initiate a firmware flash to restore device to normal
>>>> + * operation.
>>>
>>> Do we have definition on actual procedure? Can we add a reference to it?
>>> Otherwise it's telling me to do something I have no idea about.
>>
>> That is a userspace tool. I don't see any kernel code refering to userspace
>> documentation.
> 
> How are we expecting users to be know about it?

There is no documentation in kernel for fwupd or xpu-manager userspace 
tools.  The documentation should be provided by userspace tools about 
the required procedure.

I'll mention 'firmware update tools like fwupd' so that user can then 
refer to the respective documentation

Thanks
Riana

> 
> Raag



More information about the Intel-xe mailing list