[Intel-xe] [PATCH] drm/doc/rfc: SR-IOV support on the new Xe driver

Francois Dugast francois.dugast at intel.com
Tue Nov 14 10:08:23 UTC 2023


On Fri, Nov 10, 2023 at 07:22:31PM +0100, Michal Wajdeczko wrote:
> The Single Root I/O Virtualization (SR-IOV) extension to the PCI
> Express (PCIe) specification suite is supported starting from 12th
> generation of Intel Graphics processors.
> 
> This RFC aims to explain how do we want to add support for SR-IOV
> to the new Xe driver and to propose related additions to the sysfs.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Oded Gabbay <ogabbay at kernel.org>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> ---
>  Documentation/gpu/rfc/index.rst             |   5 +
>  Documentation/gpu/rfc/sysfs-driver-xe-sriov | 501 ++++++++++++++++++++
>  Documentation/gpu/rfc/xe_sriov.rst          | 192 ++++++++
>  3 files changed, 698 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/sysfs-driver-xe-sriov
>  create mode 100644 Documentation/gpu/rfc/xe_sriov.rst
> 
> diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> index e4f7b005138d..fc5bc447f30d 100644
> --- a/Documentation/gpu/rfc/index.rst
> +++ b/Documentation/gpu/rfc/index.rst
> @@ -35,3 +35,8 @@ host such documentation:
>  .. toctree::
>  
>     xe.rst
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   xe_sriov.rst
> diff --git a/Documentation/gpu/rfc/sysfs-driver-xe-sriov b/Documentation/gpu/rfc/sysfs-driver-xe-sriov
> new file mode 100644
> index 000000000000..77748204dd83
> --- /dev/null
> +++ b/Documentation/gpu/rfc/sysfs-driver-xe-sriov
> @@ -0,0 +1,501 @@
> +.. Documentation/ABI/testing/sysfs-driver-xe-sriov
> +..
> +.. Intel Xe driver ABI (SR-IOV extensions)
> +..
> +    The Single Root I/O Virtualization (SR-IOV) extension to
> +    the PCI Express (PCIe) specification suite is supported
> +    starting from 12th generation of Intel Graphics processors.
> +
> +    This document describes Xe driver specific additions.
> +
> +    For description of generic SR-IOV sysfs attributes see
> +    "Documentation/ABI/testing/sysfs-bus-pci" document.
> +
> +    /sys/bus/pci/drivers/xe/BDF/
> +    ├── sriov_auto_provisioning
> +    │   ├── admin_mode
> +    │   ├── enabled
> +    │   ├── reset_defaults
> +    │   ├── resources
> +    │   │   ├── default_contexts_quota
> +    │   │   ├── default_doorbells_quota
> +    │   │   ├── default_ggtt_quota
> +    │   │   └── default_lmem_quota
> +    │   ├── scheduling
> +    │   │   ├── default_exec_quantum_ms
> +    │   │   └── default_preempt_timeout_us
> +    │   └── monitoring
> +    │       ├── default_cat_error_count
> +    │       ├── default_doorbell_time_us
> +    │       ├── default_engine_reset_count
> +    │       ├── default_h2g_time_us
> +    │       ├── default_irq_time_us
> +    │       └── default_page_fault_count
> +
> +    /sys/bus/pci/drivers/xe/BDF/
> +    ├── sriov_extensions
> +    │   ├── monitoring_period_ms
> +    │   ├── strict_scheduling_enabled
> +    │   ├── pf
> +    │   │   ├── device -> ../../../BDF
> +    │   │   ├── priority
> +    │   │   ├── tile0
> +    │   │   │   ├── gt0
> +    │   │   │   │   ├── exec_quantum_ms
> +    │   │   │   │   ├── preempt_timeout_us
> +    │   │   │   │   └── thresholds
> +    │   │   │   │       ├── cat_error_count
> +    │   │   │   │       ├── doorbell_time_us
> +    │   │   │   │       ├── engine_reset_count
> +    │   │   │   │       ├── h2g_time_us
> +    │   │   │   │       ├── irq_time_us
> +    │   │   │   │       └── page_fault_count
> +    │   │   │   └── gtX
> +    │   │   └── tileT
> +    │   ├── vf1
> +    │   │   ├── device -> ../../../BDF+1
> +    │   │   ├── stop
> +    │   │   ├── tile0
> +    │   │   │   ├── ggtt_quota
> +    │   │   │   ├── lmem_quota
> +    │   │   │   ├── gt0
> +    │   │   │   │   ├── contexts_quota
> +    │   │   │   │   ├── doorbells_quota
> +    │   │   │   │   ├── exec_quantum_ms
> +    │   │   │   │   ├── preempt_timeout_us
> +    │   │   │   │   └── thresholds
> +    │   │   │   │       ├── cat_error_count
> +    │   │   │   │       ├── doorbell_time_us
> +    │   │   │   │       ├── engine_reset_count
> +    │   │   │   │       ├── h2g_time_us
> +    │   │   │   │       ├── irq_time_us
> +    │   │   │   │       └── page_fault_count
> +    │   │   │   └── gtX
> +    │   │   └── tileT
> +    │   └── vfN
> +..
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		This directory appears on the device when:
> +
> +		 - device supports SR-IOV, and
> +		 - device is a Physical Function (PF), and
> +		 - xe driver supports SR-IOV PF on given device, and
> +		 - xe driver supports automatic VFs provisioning.
> +
> +		This directory is used as a root for all attributes related to
> +		automatic provisioning of SR-IOV Physical Function (PF) and/or
> +		Virtual Functions (VFs).
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/enabled
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		(RW) bool (0, 1)
> +
> +		This file represents configuration flag for the automatic VFs
> +		(un)provisioning that could be performed by the PF.
> +
> +		The default value is 1 (true).
> +
> +		This flag can be set to false, unless manual provisioning is not
> +		applicable for given platform or it is not supported by current
> +		PF implementation. In such cases -EPERM will be returned.
> +
> +		This flag will be automatically set to false when there will be
> +		other attempts to change any of VF's resource provisioning.
> +		See "sriov_extensions" section for details.
> +
> +		This flag can be set back to true if and only if all VFs are
> +		fully unprovisioned, otherwise -EEXIST error will be returned.
> +
> +		false = "disabled"
> +			When disabled, then PF will not attempt to do automatic
> +			VFs provisioning when VFs are being enabled and will not
> +			perform automatic unprovisioning of the VFs when VFs will
> +			be disabled.
> +
> +		true = "enabled"
> +			When enabled, then on VFs enabling PF will do automatic
> +			VFs provisioning based on the default settings described
> +			below.
> +
> +			If automatic VFs provisioning fails due to some reasons,
> +			then VFs will not be enabled.
> +
> +			If enabled, all resources allocated during VFs enabling
> +			will be released during VFs disabling (automatic unprovisioning).
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/admin_mode
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		(RW) bool (0, 1)
> +
> +		This file represents configuration flag for the automatic VFs
> +		provisioning that could be performed by the PF.
> +
> +		The default value depends on the platform type.
> +
> +		This flag can be changed any time, but will have no effect if
> +		VFs are already provisioned.
> +
> +		If enabled (default on discrete platforms) then the PF will
> +		retain only minimum hardcoded resources for its own use when
> +		doing VFs automatic provisioning and will not use any default
> +		values described below for its own configuration.
> +
> +		If disabled (default on integrated platforms) then the PF will
> +		treat itself like yet another additional VF in all fair resource
> +		allocations and will also try to apply default provisioning
> +		values described below for its own configuration.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/reset_defaults
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		(WO) bool (1)
> +
> +		Writing to this file will reset all default provisioning parameters
> +		listed below to the default values.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/resources/default_contexts_quota
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/resources/default_doorbells_quota
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/resources/default_ggtt_quota
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/resources/default_lmem_quota
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/scheduling/default_exec_quantum_ms
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/scheduling/default_preempt_timeout_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_cat_error_count
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_doorbell_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_engine_reset_count
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_h2g_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_irq_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_auto_provisioning/monitoring/default_page_fault_count
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		These files represent default provisioning that should be used
> +		for VFs automatic provisioning.
> +
> +		These values can be changed any time, but will have no effect if
> +		VFs are already provisioned.
> +
> +		default_contexts_quota: (RW) integer 0..U32_MAX
> +			The number of GuC context IDs to provide to the VF.
> +			The default value is 0 (use fair allocations).
> +			See "sriov_extensions/vfN/tileT/gtX/contexts_quota" for details.
> +
> +		default_doorbells_quota: (RW) integer 0..U32_MAX
> +			The number of GuC doorbells to provide to the VF.
> +			The default value is 0 (use fair allocations).
> +			See "sriov_extensions/vfN/tileT/gtX/doorbells_quota" for details.
> +
> +		default_ggtt_quota: (RW) integer 0..U32_MAX
> +			The size of the GGTT address space (in bytes) to provide to the VF.
> +			The default value is 0 (use fair allocations).
> +			See "sriov_extensions/vfN/tileT/ggtt_quota" for details.
> +
> +		default_lmem_quota: (RW) integer 0..U32_MAX
> +			The size of the LMEM (in bytes) to provide to the VF.
> +			The default value is 0 (use fair allocations).
> +			See "sriov_extensions/vfN/tileT/lmem_quota" for details.
> +
> +		default_exec_quantum_ms: (RW) integer 0..U32_MAX
> +			The GT execution quantum (in millisecs) assigned to the function.
> +			The default value is 0 (infinify).
> +			See "sriov_extensions/vfN/tileT/gtX/exec_quantum_ms" for details.
> +
> +		default_preempt_timeout_us: (RW) integer 0..U32_MAX
> +			The GT preemption timeout (in microsecs) assigned to the function.
> +			The default value is 0 (infinity).
> +			See "sriov_extensions/vfN/tileT/gtX/preempt_timeout_us" for details.
> +
> +		default_cat_error_count: (RW) integer 0..U32_MAX
> +		default_doorbell_time_us: (RW) integer 0..U32_MAX
> +		default_engine_reset_count: (RW) integer 0..U32_MAX
> +		default_h2g_time_us: (RW) integer 0..U32_MAX
> +		default_irq_time_us: (RW) integer 0..U32_MAX
> +		default_page_fault_count: (RW) integer 0..U32_MAX
> +			The monitoring threshold to be set for the function.
> +			The default value is 0 (don't monitor).
> +			See "sriov_extensions/vfN/tileT/gtX/thresholds" for details.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		This directory appears on Xe device when:
> +
> +		 - device supports SR-IOV, and
> +		 - device is a Physical Function (PF), and
> +		 - driver is enabled to support SR-IOV PF on given device.
> +
> +		This directory is used as a root for all attributes required to
> +		manage both Physical Function (PF) and Virtual Functions (VFs).
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/strict_scheduling_enabled
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		(RW) bool
> +
> +		This file represents a flag used to determine if scheduling
> +		parameters should be respected even if there is no active
> +		workloads submitted by the PF or VFs.
> +
> +		This flag is disabled by default, unless strict scheduling is
> +		not applicable on given platform. In such case this file will
> +		be read-only.
> +
> +		The change to this file may have no effect if VFs are not yet enabled.
> +		If strict scheduling can't be enabled in GuC then write will fail with -EIO.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/monitoring_period_ms
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		(RW) integer
> +
> +		This file represents the configuration knob used by adverse event
> +		monitoring. A value here is the period in millisecs during which
> +		events are counted and the total is checked against a threshold.
> +		See "sriov_extensions/vfN/tileT/gtX/thresholds" for more details.
> +
> +		Default is 0 (monitoring is disabled).
> +
> +		If monitoring capability is not available, then attempt to enable
> +		will fail with -EPERM error. If monitoring can't be enabled in
> +		GuC then write will fail with -EIO.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		This directory holds all attributes related to the SR-IOV
> +		Physical Function (PF).
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		This directory holds all attributes related to the SR-IOV
> +		Virtual Function (VF).
> +
> +		Note that VF numbers (N) are 1-based as described in PCI SR-IOV specification.
> +		The Xe driver implementaton follows that naming schema.
> +
> +		There will be "vf1", "vf2" up to "vfN" directories, where N matches
> +		value of the PCI "sriov_totalvfs" attribute.
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		This directory holds all SR-IOV attributes related to the device tile.
> +		The tile numbers (T) start from 0.
> +
> +		There is at least one "tile0/" directory present.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		This directory holds all SR-IOV attributes related to the device GT.
> +		The GT numbers (X) start from 0.
> +
> +		There is at least one "gt0/" directory present.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/device
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/device
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		(symbolic link)
> +
> +		Backlink to the PCI device entry representing given function.
> +		For PF this link is always present.
> +		For VF this link is present only for currently enabled VFs.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/priority
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		(RW) string
> +
> +		This file represents a GuC Scheduler knob to override the default
> +		round-robin or FIFO scheduler policies implemented by the GuC.
> +
> +		The default value is "peer".
> +
> +		This flag can be changed, unless such change is not applicable
> +		for given platform or is not supported by current GuC firmware.
> +		In such case this file could be read-only or will return -EPERM
> +		on write attempt.
> +
> +		"immediate"
> +			GuC will Schedule PF workloads immediately and PF
> +			workloads only until the PF's work queues in GuC
> +			are empty.
> +
> +		"lazy"
> +			GuC will Schedule PF workloads at the next opportune
> +			moment and PF workloads only until the PF work queues
> +			in GuC are empty.
> +
> +		"peer"
> +			GuC Scheduler will treat PF and VFs with equal priority.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/stop
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		(WO) bool (1)
> +
> +		Write to this file will force GuC to stop handle any requests from
> +		this VF, but without triggering a FLR.
> +		To recover, the full FLR must be issued using generic "device/reset".
> +
> +		This file allows to implement custom policy mechanism when VF is
> +		misbehaving and triggering adverse events above defined thresholds.
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/exec_quantum_ms
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/preempt_timeout_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/exec_quantum_ms
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/preempt_timeout_us
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		These files represent scheduling parameters of the functions.
> +
> +		These scheduling parameters can be changed even if VFs are enabled
> +		and running, unless such change is not applicable on given platform
> +		due to fixed hardware or firmware assignment.
> +
> +		exec_quantum_ms: (RW) integer 0..U32_MAX
> +			The GT execution quantum in [ms] assigned to the function.
> +			Requested quantum might be aligned per HW/FW requirements.
> +
> +			Default is 0 (unlimited).
> +
> +		preempt_timeout_us: (RW) integer 0..U32_MAX
> +			The GT preemption timeout in [us] assigned to the function.
> +			Requested timeout might be aligned per HW/FW requirements.
> +
> +			Default is 0 (unlimited).
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/ggtt_quota
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/lmem_quota
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/contexts_quota
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/doorbells_quota
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		These files represent shared resource assigned to the functions.
> +
> +		These resource parameters can be changed, unless VF is already running,
> +		or such change is not applicable on given platform due to fixed hardware
> +		or firmware assignment.
> +
> +		Writes to these attributes may fail with:
> +			-EPERM if change is not applicable on give HW/FW.
> +			-E2BIG if value larger that HW/FW limit.
> +			-EDQUOT if value is larger than maximum quota defined by the PF.
> +			-ENOSPC if PF can't allocate required quota.
> +			-EBUSY if the resource is currently in use by the VF.
> +			-EIO if GuC refuses to change provisioning.
> +
> +		ggtt_quota: (RW) integer 0..U64_MAX
> +			The size of the GGTT address space (in bytes) assigned to the VF.
> +			The value might be aligned per HW/FW requirements.
> +
> +			Default is 0 (unprovisioned).
> +
> +		lmem_quota: (RW) integer 0..U64_MAX
> +			The size of the Local Memory (in bytes) assigned to the VF.
> +			The value might be aligned per HW/FW requirements.
> +
> +			This attribute is only available on discrete platforms.
> +
> +			Default is 0 (unprovisioned).
> +
> +		contexts_quota: (RW) 0..U16_MAX
> +			The number of GuC submission contexts assigned to the VF.
> +			This value might be aligned per HW/FW requirements.
> +
> +			Default is 0 (unprovisioned).
> +
> +		doorbells_quota: (RW) 0..U16_MAX
> +			The number of GuC doorbells assigned to the VF.
> +			This value might be aligned per HW/FW requirements.
> +
> +			Default is 0 (unprovisioned).
> +
> +
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/cat_error_count
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/doorbell_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/engine_reset_count
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/h2g_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/irq_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/pf/tileT/gtX/thresholds/page_fault_count
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/cat_error_count
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/doorbell_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/engine_reset_count
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/h2g_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/irq_time_us
> +What:		/sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/tileT/gtX/thresholds/page_fault_count
> +Date:		2024
> +KernelVersion:	TBD
> +Contact:	intel-xe at lists.freedesktop.org
> +Description:
> +		These files represent threshold values used by the GuC to trigger
> +		security events if adverse event monitoring is enabled.
> +
> +		These thresholds are checked every "monitoring_period_ms".
> +		Refer to GuC ABI for details about each threshold category.
> +
> +		Default value for all thresholds is 0 (disabled).
> +
> +		cat_error_count: (RW) integer
> +		doorbell_time_us: (RW) integer
> +		engine_reset_count: (RW) integer
> +		h2g_time_us: (RW) integer
> +		irq_time_us: (RW) integer
> +		page_fault_count: (RW) integer
> diff --git a/Documentation/gpu/rfc/xe_sriov.rst b/Documentation/gpu/rfc/xe_sriov.rst
> new file mode 100644
> index 000000000000..574f6414eabb
> --- /dev/null
> +++ b/Documentation/gpu/rfc/xe_sriov.rst
> @@ -0,0 +1,192 @@
> +.. SPDX-License-Identifier: MIT
> +
> +========================
> +Xe – SR-IOV Support Plan
> +========================
> +
> +The Single Root I/O Virtualization (SR-IOV) extension to the PCI Express (PCIe)
> +specification suite is supported starting from 12th generation of Intel Graphics
> +processors.
> +
> +This document describes planned ABI of the new Xe driver (see xe.rst) that will
> +provide flexible configuration and management options related to the SR-IOV.
> +It will also highlight few most important changes to the Xe driver
> +implementation to deal with Intel GPU SR-IOV specific requirements.
> +
> +
> +SR-IOV Capability
> +=================
> +
> +Due to SR-IOV complexity and required co-operation between hardware, firmware
> +and kernel drivers, not all Xe architecture platforms might have SR-IOV enabled
> +or fully functional.
> +
> +To control at the driver level which platform will provide support for SR-IOV,
> +as we can't just rely on the PCI configuration data exposed by the hardware,
> +we will introduce "has_sriov" flag to the struct xe_device_desc that describes
> +a device capabilities that driver checks during the probe.
> +
> +Initially this flag will be set to disabled even on platforms that we plan to
> +support. We will enable this flag only once we finish merging all required
> +changes to the driver and related validated firmwares are also made available.
> +
> +
> +SR-IOV Platforms
> +================
> +
> +Initially we plan to add SR-IOV functionality to the following SDV platforms
> +already supported by the Xe driver:
> +
> + - TGL (up to 7 VFs)
> + - ADL (up to 7 VFs)
> + - MTL (up to 7 VFs)
> + - ATSM (up to 31 VFs)
> + - PVC (up to 63 VFs)
> +
> +Newer platforms will be supported later, but we hope that enabling will be
> +much faster, as majority of the driver changes are either platform agnostic
> +or are similar between earlier platforms (hence we start with SDVs).
> +
> +
> +PF Mode
> +=======
> +
> +Support in the driver for acting in Physical Function (PF) mode, i.e. mode
> +that allows configuration of VFs, depends on the CONFIG_PCI_IOV and will be
> +enabled by default.
> +
> +However, due to potentially conflicting requirements for SR-IOV and other mega
> +features, we might want to have an option to disable SR-IOV PF mode support at
> +the driver load time.

What about making SR-IOV support in Xe dependent on a separate build option, such
as CONFIG_DRM_XE_SRIOV? This would allow users to enable SR-IOV with CONFIG_PCI_IOV
to virtualize other devices, let's say a network adapter, but to keep this feature
compiled out of Xe.

Francois

> +
> +Thus, we plan to use additional modparam named "sriov_totalvfs" which if set to
> +0 will force the driver to operate in the native (non-virtualized) mode.
> +The same modparam could be used to limit number of supported Virtual Functions
> +(VFs) by the driver compared to the hardware limit exposed in PCI configuration.
> +
> +The name of this modparam corresponds to the existing PCI sysfs attribute, that
> +by default exposes hardware capability.
> +
> +The default value of this param will allow to support all possible VFs as
> +claimed by the hardware.
> +
> +This modparam will have no effect if driver is running on the VF device.
> +
> +
> +VFs Enabling
> +============
> +
> +To enable or disable VFs we plan to rely on existing sysfs attribute exposed by
> +the PCI subsystem named "sriov_numvfs". We will provide all necessary tweaks to
> +provision VFs in our custom implementation of the "sriov_configure" hook from
> +the struct pci_driver.
> +
> +If for some reason, including explicit request to disable SR-IOV PF mode using
> +modparam, we will not be able to correctly support any VFs, driver will change
> +number of supported VFs, exposed to the userspace by "sriov_totalvfs" attribute,
> +to 0, thus preventing configuration of the VFs.
> +
> +
> +VF Mode
> +=======
> +
> +When driver is running on the VF device, then due to hardware enforcements,
> +access to the privileged registers is not possible. To avoid relying on these
> +registers, we plan to perform early detection if we are running on the VF
> +device using dedicated VF_CAP(0x1901f8) register and then use global macro
> +IS_SRIOV_VF(xe) to control the driver logic.
> +
> +To speed up merging of the required changes, we might first introduce dummy
> +macro that is always set to false, to prepare driver to avoid some code paths
> +before we finalize our VF mode detection and other VFs enabling changes.
> +
> +
> +Resources
> +=========
> +
> +Most of the hardware (or firmware) resources available on the Xe architecture,
> +like GGTT, LMEM, GuC context IDs, GuC doorbells, will be shared between PF and
> +VFs and will require some provisioning steps to assign those resources for use
> +by the VF.
> +
> +Until VFs are provisioned with resources, the PF driver will be able to use all
> +resources, in the same way as it would be running in non-virtualized mode.
> +
> +If some resource (of part or region of it) is assigned to specific VF, then PF
> +is not allowed to use that part or region of the resource, but can continue to
> +use whatever is left available.
> +
> +Those resources are usually fully virtualized, so they will not require any
> +special handling when used by the VF driver, except that VF driver must know
> +the assigned quota.
> +
> +The most notable exception is the GGTT address space, as on some platforms,
> +the VF driver must additionally know the real range that it can access.
> +
> +Once the resources were assigned to the VF use and the VF driver has started,
> +then it is not allowed to change such provisioning, as that would break the
> +VF driver. To make changes the VF driver, which was using these resources,
> +must be unloaded (or the VM is terminated) and the VF device must be reset
> +using the FLR.
> +
> +
> +Scheduling
> +==========
> +
> +The workloads from PF driver and VF drivers must be submitted to the hardware
> +always by using the GuC submission mechanism. Unless VF has exclusive access
> +to the GT then submissions from different VFs are time-sliced and controlled
> +with additional "execution_quantum" and "preemption_timeout" parameters.
> +
> +In contrast to the resource provisioning, those scheduling parameters can be
> +changed even if VF drivers are already running and are active.
> +
> +
> +Automatic VFs Provisioning
> +==========================
> +
> +To provide out-of-the box experience when user will be enabling VFs using
> +generic "sriov_numvfs" attribute without requiring complex provisioning steps,
> +the SR-IOV PF driver will implement automatic VFs resource provisioning.
> +
> +By default, all VFs will be allocated with the fair amount of the mandatory
> +resources (like GGTT, GuC IDs) and with unrestricted scheduling parameters.
> +Such provisioning should be sufficient for most of the normal usages, when
> +no strict SLA is required.
> +
> +The PF driver will also expose some additional sysfs files to allow adjusting
> +this automatic VFs provisioning, like default values for most of the
> +provisioning parameters that PF will then apply for each enabled VF.
> +
> +    Details about those extension can be found in
> +    :download:`Preliminary Xe driver ABI <sysfs-driver-xe-sriov>`.
> +
> +
> +Manual VFs Provisioning
> +=======================
> +
> +If automatic VFs provisioning, which applies same configuration to every VF,
> +is not sufficient or there is a need for advanced customization of some VF,
> +the PF driver will also provide extended sysfs interface which will allow
> +control every provisioning attribute to the lowest feasible level.
> +
> +It is expected that these low-level attributes will be mostly used by the
> +advanced users or by the custom tools that will setup configurations that
> +meet predefined and validated SLA as required by the customers.
> +
> +    Details about those extension can be found in
> +    :download:`Preliminary Xe driver ABI <sysfs-driver-xe-sriov>`.
> +
> +
> +VFs Monitoring
> +==============
> +
> +In addition to the resource provisioning or changing scheduling parameters,
> +the PF driver might also allow configure some monitoring parameters, like
> +thresholds of adverse events or sample period, to track undesired behavior
> +of the VFs that could impact the whole system.
> +
> +Once those thresholds are setup and sampling period is defined, the GuC will
> +notify the PF driver about which VF is excessing the threshold and then PF is
> +able to trigger the uevent to notify the administrator (or VMM) that could
> +take some action against the VF.
> -- 
> 2.25.1
> 


More information about the Intel-xe mailing list