[PATCH] drm/xe/pf: Restore TotalVFs

Lucas De Marchi lucas.demarchi at intel.com
Mon Jul 15 17:09:43 UTC 2024


On Mon, Jul 15, 2024 at 01:40:52PM GMT, Michal Wajdeczko wrote:
>If we update TotalVFs to a new value and fail the .probe() then
>the PCI subsystem will not restore original value, which might
>impact next .probe() attempt. As a backup plan, use managed action
>to restore it ourselves.
>
>Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>Cc: Piotr Piórkowski <piotr.piorkowski at intel.com>
>---
>Attempt to fix that at PCI layer is here [1], but until it will be
>available for us, we can still fix that on our side.
>
>[1] https://lore.kernel.org/linux-pci/20240715102835.1286-1-michal.wajdeczko@intel.com/

I don't think we should merge this while that is pending with no reply.
If that is not accepted, then we can think about doing this.

>---
> drivers/gpu/drm/xe/xe_sriov_pf.c | 48 +++++++++++++++++++++++++++-----
> 1 file changed, 41 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_sriov_pf.c b/drivers/gpu/drm/xe/xe_sriov_pf.c
>index 0f721ae17b26..d61933770f1c 100644
>--- a/drivers/gpu/drm/xe/xe_sriov_pf.c
>+++ b/drivers/gpu/drm/xe/xe_sriov_pf.c
>@@ -17,12 +17,40 @@ static unsigned int wanted_max_vfs(struct xe_device *xe)
> 	return xe_modparam.max_vfs;
> }
>
>-static int pf_reduce_totalvfs(struct xe_device *xe, int limit)
>+static void restore_totalvfs(void *arg)
>+{
>+	struct xe_device *xe = arg;
>+	struct device *dev = xe->drm.dev;
>+	struct pci_dev *pdev = to_pci_dev(dev);
>+	int totalvfs = xe->sriov.pf.device_total_vfs;
>+
>+	xe_sriov_dbg_verbose(xe, "restoring totalvfs to %d\n", totalvfs);
>+	pci_sriov_set_totalvfs(pdev, totalvfs);
>+}
>+
>+static int pf_reduce_totalvfs(struct xe_device *xe, u16 limit)

		^^^^^^

so the current logic assumes the maximum comes from BIOS, but it could
rather come from a previous (failed) driver load? once we set the value
is the information lost from the device or can we recover it from somewhere?
somewhere != where we saved in the this driver.


> {
> 	struct device *dev = xe->drm.dev;
> 	struct pci_dev *pdev = to_pci_dev(dev);
> 	int err;
>
>+	xe->sriov.pf.device_total_vfs = pci_sriov_get_totalvfs(pdev);
>+	xe->sriov.pf.driver_max_vfs = limit;
>+
>+	xe_sriov_dbg_verbose(xe, "reducing totalvfs from %u to %u\n",
>+			     xe->sriov.pf.device_total_vfs,
>+			     xe->sriov.pf.driver_max_vfs);
>+
>+	/*
>+	 * XXX: If we update TotalVFs to a new value and fail the .probe()

why XXX?

>+	 * then the PCI subsystem might not restore original TotalVFs value,
>+	 * which might impact next .probe() attempt. As a backup plan, use
>+	 * managed action to restore it ourselves.
>+	 *
>+	 * Note that it's not critical if registering that action fails.
>+	 */
>+	devm_add_action(xe->drm.dev, restore_totalvfs, xe);
>+
> 	err = pci_sriov_set_totalvfs(pdev, limit);
> 	if (err)
> 		xe_sriov_notice(xe, "Failed to set number of VFs to %d (%pe)\n",
>@@ -37,6 +65,14 @@ static bool pf_continue_as_native(struct xe_device *xe, const char *why)
> 	return false;
> }
>
>+static bool pf_continue_ready(struct xe_device *xe, u16 max_vfs)
>+{
>+	xe_assert(xe, max_vfs > 0);
>+	xe_sriov_dbg(xe, "preparing for %u VF%s\n", max_vfs, str_plural(max_vfs));
>+	pf_reduce_totalvfs(xe, max_vfs);
>+	return true;

what's the point of splitting this function that always return true? I'd
rather leave it where it was

Lucas De Marchi

>+}
>+
> /**
>  * xe_sriov_pf_readiness - Check if PF functionality can be enabled.
>  * @xe: the &xe_device to check
>@@ -58,18 +94,16 @@ bool xe_sriov_pf_readiness(struct xe_device *xe)
> 	if (!dev_is_pf(dev))
> 		return false;
>
>+	if (!totalvfs)
>+		return pf_continue_as_native(xe, "no VFs reported");
>+
> 	if (!xe_device_uc_enabled(xe))
> 		return pf_continue_as_native(xe, "Guc submission disabled");
>
> 	if (!newlimit)
> 		return pf_continue_as_native(xe, "all VFs disabled");
>
>-	pf_reduce_totalvfs(xe, newlimit);
>-
>-	xe->sriov.pf.device_total_vfs = totalvfs;
>-	xe->sriov.pf.driver_max_vfs = newlimit;
>-
>-	return true;
>+	return pf_continue_ready(xe, newlimit);
> }
>
> /**
>-- 
>2.43.0
>


More information about the Intel-xe mailing list