[PATCH v5 4/9] drm/xe/xe_survivability: Refactor survivability mode
Riana Tauro
riana.tauro at intel.com
Wed Jul 23 14:52:18 UTC 2025
On 7/23/2025 7:30 PM, Raag Jadav wrote:
> On Tue, Jul 15, 2025 at 04:17:24PM +0530, Riana Tauro wrote:
>> The patches in these series refactor the boot survivability code to
>> allow adding runtime survivability
>> Refactor existing code to separate both the modes
>
> Punctuations please!
>
>> This patch renames the functions and separates init and enable
>
> ...
>
>> static ssize_t survivability_mode_show(struct device *dev,
>> struct device_attribute *attr, char *buff)
>> {
>> @@ -130,6 +138,11 @@ static ssize_t survivability_mode_show(struct device *dev,
>> struct xe_survivability_info *info = survivability->info;
>> int index = 0, count = 0;
>>
>> + count += sysfs_emit_at(buff, count, "Survivability mode type: Boot\n");
>
> Although I'm okay with this but, should we make it something more parseable
> from userspace?
Suggestions?
All the rest of the information is also in <name>:<value> pairs.
Dumping scratch registers is not useful for runtime survivability so
added a line instead of an empty file
>
>> + if (!check_boot_failure(xe))
>> + return count;
>> +
>
> ...
>
>> +int xe_survivability_mode_boot_enable(struct xe_device *xe)
>> {
>> struct xe_survivability *survivability = &xe->survivability;
>> - struct xe_survivability_info *info;
>> struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>> + int ret;
>>
>> if (!xe_survivability_mode_is_requested(xe))
>> return 0;
>>
>> - survivability->size = MAX_SCRATCH_MMIO;
>> -
>> - info = devm_kcalloc(xe->drm.dev, survivability->size, sizeof(*info),
>> - GFP_KERNEL);
>> - if (!info)
>> - return -ENOMEM;
>> -
>> - survivability->info = info;
>> -
>> - populate_survivability_info(xe);
>> + ret = init_survivability_mode(xe);
>> + if (ret)
>> + return ret;
>>
>> - /* Only log debug information and exit if it is a critical failure */
>> + /* Log breadcrumbs but do not enter survivability mode for Critical boot errors */
>> if (survivability->boot_status == CRITICAL_FAILURE) {
>> log_survivability_info(pdev);
>
> I'm not much informed about the history here, but should we be logging the
> scratchs if we consider them sensitive?
For non-critical, survivability mode is enabled and a firmware flash can
be triggered to recover. For critical, the scratch registers are dumped
for more information about failure since there is no sysfs. It would be
useful to admin to find more information about failure
Thanks
Riana
>
> Raag
More information about the dri-devel
mailing list