[PATCH 2/2] RFC drm/xe: Enable Boot Survivability mode
Jani Nikula
jani.nikula at linux.intel.com
Mon Dec 16 10:42:11 UTC 2024
On Thu, 12 Dec 2024, Riana Tauro <riana.tauro at intel.com> wrote:
> Enable boot survivability mode if pcode initialization fails and
> if boot status indicates a failure. In this mode, drm card is not
> exposed and driver probe returns success after loading the bare minimum
> to allow firmware to be flashed via mei.
>
> Signed-off-by: Riana Tauro <riana.tauro at intel.com>
> ---
> drivers/gpu/drm/xe/xe_device.c | 9 +++++++--
> drivers/gpu/drm/xe/xe_pci.c | 13 +++++++++++++
> drivers/gpu/drm/xe/xe_survivability_mode.c | 3 +++
> 3 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 56d4ffb650da..50ed980e1db9 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -51,6 +51,7 @@
> #include "xe_pm.h"
> #include "xe_query.h"
> #include "xe_sriov.h"
> +#include "xe_survivability_mode.h"
> #include "xe_tile.h"
> #include "xe_ttm_stolen_mgr.h"
> #include "xe_ttm_sys_mgr.h"
> @@ -585,8 +586,12 @@ int xe_device_probe_early(struct xe_device *xe)
> update_device_info(xe);
>
> err = xe_pcode_probe_early(xe);
> - if (err)
> - return err;
> + if (err) {
> + if (xe->info.platform == XE_BATTLEMAGE && xe_survivability_mode_required(xe))
Why the platform check here? Doesn't this stuff belong abstracted inside
the survivability mode?
> + xe_survivability_mode_init(xe);
> +
> + return xe->survivability.mode ? 0 : err;
Is it a good idea to start looking at survivability guts from all over
the place? I mean xe->survivability.mode. Even its value should be an
implementation detail, and this is using it to decide whether the
previous call succeeded.
I think this would benefit from hiding stuff better and providing
interfaces. This is one of the things i915 sucks at, and it's really
hard and tedious work to fix afterwards.
Just imagine xe->survavibility is an opaque pointer (even if it isn't)
and implement stuff based on that. It will make a world of difference in
future maintainability.
BR,
Jani.
> + }
>
> err = wait_for_lmem_ready(xe);
> if (err)
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index 7d146e3e8e21..b9dcd36de06d 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -30,6 +30,7 @@
> #include "xe_pm.h"
> #include "xe_sriov.h"
> #include "xe_step.h"
> +#include "xe_survivability_mode.h"
> #include "xe_tile.h"
>
> enum toggle_d3cold {
> @@ -768,6 +769,9 @@ static void xe_pci_remove(struct pci_dev *pdev)
> if (IS_SRIOV_PF(xe))
> xe_pci_sriov_configure(pdev, 0);
>
> + if (xe->survivability.mode)
> + return xe_survivability_mode_remove(xe);
> +
> xe_device_remove(xe);
> xe_pm_runtime_fini(xe);
> pci_set_drvdata(pdev, NULL);
> @@ -840,6 +844,15 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> return err;
>
> err = xe_device_probe_early(xe);
> +
> + /*
> + * In Boot Survivability mode, no drm card is exposed
> + * and driver is loaded with bare minimum to allow
> + * for firmware to be flashed through mei
> + */
> + if (!err && xe->survivability.mode)
> + return 0;
> +
> if (err)
> return err;
>
> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
> index 7e36989efd68..6c1e79b5c15f 100644
> --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
> @@ -176,7 +176,10 @@ bool xe_survivability_mode_required(struct xe_device *xe)
> */
> void xe_survivability_mode_remove(struct xe_device *xe)
> {
> + struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> +
> sysfs_remove_files(&xe->drm.dev->kobj, survivability_attrs);
> + pci_set_drvdata(pdev, NULL);
> }
>
> /**
--
Jani Nikula, Intel
More information about the Intel-xe
mailing list