[PATCH] drm/doc/rfc: SR-IOV support on the new Xe driver

Vivi, Rodrigo rodrigo.vivi at intel.com
Tue Nov 14 13:22:48 UTC 2023


On Tue, 2023-11-14 at 12:37 +0000, Tvrtko Ursulin wrote:
> 
> On 10/11/2023 18:22, Michal Wajdeczko wrote:
> > The Single Root I/O Virtualization (SR-IOV) extension to the PCI
> > Express (PCIe) specification suite is supported starting from 12th
> > generation of Intel Graphics processors.
> > 
> > This RFC aims to explain how do we want to add support for SR-IOV
> > to the new Xe driver and to propose related additions to the sysfs.
> > 
> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > Cc: Oded Gabbay <ogabbay at kernel.org>
> > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> > Cc: Daniel Vetter <daniel at ffwll.ch>
> > ---
> >   Documentation/gpu/rfc/index.rst             |   5 +
> >   Documentation/gpu/rfc/sysfs-driver-xe-sriov | 501
> > ++++++++++++++++++++
> >   Documentation/gpu/rfc/xe_sriov.rst          | 192 ++++++++
> >   3 files changed, 698 insertions(+)
> >   create mode 100644 Documentation/gpu/rfc/sysfs-driver-xe-sriov
> >   create mode 100644 Documentation/gpu/rfc/xe_sriov.rst
> > 
> > diff --git a/Documentation/gpu/rfc/index.rst
> > b/Documentation/gpu/rfc/index.rst
> > index e4f7b005138d..fc5bc447f30d 100644
> > --- a/Documentation/gpu/rfc/index.rst
> > +++ b/Documentation/gpu/rfc/index.rst
> > @@ -35,3 +35,8 @@ host such documentation:
> >   .. toctree::
> >   
> >      xe.rst
> > +
> > +.. toctree::
> > +   :maxdepth: 1
> > +
> > +   xe_sriov.rst
> > diff --git a/Documentation/gpu/rfc/sysfs-driver-xe-sriov
> > b/Documentation/gpu/rfc/sysfs-driver-xe-sriov
> > new file mode 100644
> > index 000000000000..77748204dd83
> > --- /dev/null
> > +++ b/Documentation/gpu/rfc/sysfs-driver-xe-sriov
> > @@ -0,0 +1,501 @@
> > +.. Documentation/ABI/testing/sysfs-driver-xe-sriov
> > +..
> > +.. Intel Xe driver ABI (SR-IOV extensions)
> > +..
> > +    The Single Root I/O Virtualization (SR-IOV) extension to
> > +    the PCI Express (PCIe) specification suite is supported
> > +    starting from 12th generation of Intel Graphics processors.
> > +
> > +    This document describes Xe driver specific additions.
> > +
> > +    For description of generic SR-IOV sysfs attributes see
> > +    "Documentation/ABI/testing/sysfs-bus-pci" document.
> > +
> > +    /sys/bus/pci/drivers/xe/BDF/
> > +    ├── sriov_auto_provisioning
> > +    │   ├── admin_mode
> > +    │   ├── enabled
> > +    │   ├── reset_defaults
> > +    │   ├── resources
> > +    │   │   ├── default_contexts_quota
> > +    │   │   ├── default_doorbells_quota
> > +    │   │   ├── default_ggtt_quota
> > +    │   │   └── default_lmem_quota
> > +    │   ├── scheduling
> > +    │   │   ├── default_exec_quantum_ms
> > +    │   │   └── default_preempt_timeout_us
> > +    │   └── monitoring
> > +    │       ├── default_cat_error_count
> > +    │       ├── default_doorbell_time_us
> > +    │       ├── default_engine_reset_count
> > +    │       ├── default_h2g_time_us
> > +    │       ├── default_irq_time_us
> > +    │       └── default_page_fault_count
> 
>  From the department of bike-shedding, one alternative could be to
> have 
> a directory called defaults which avoids having to have the default_ 
> prefix on everything under it.

good idea. probably with a '.' prefix to make it hidden like we have
for other stuff already.

'.defaults'

> 
> > +
> > +    /sys/bus/pci/drivers/xe/BDF/
> > +    ├── sriov_extensions
> 
> Should this be xe_sriov_extensions or if not doesn't it need
> agreement 
> to reserve the keyword in Documentation/ABI/testing/sysfs-bus-pci? 
> Sriov_auto_provisioning too I guess.
> 
> > +    │   ├── monitoring_period_ms
> > +    │   ├── strict_scheduling_enabled
> > +    │   ├── pf
> > +    │   │   ├── device -> ../../../BDF
> > +    │   │   ├── priority
> > +    │   │   ├── tile0
> > +    │   │   │   ├── gt0
> > +    │   │   │   │   ├── exec_quantum_ms
> > +    │   │   │   │   ├── preempt_timeout_us
> > +    │   │   │   │   └── thresholds
> > +    │   │   │   │       ├── cat_error_count
> > +    │   │   │   │       ├── doorbell_time_us
> > +    │   │   │   │       ├── engine_reset_count
> > +    │   │   │   │       ├── h2g_time_us
> > +    │   │   │   │       ├── irq_time_us
> > +    │   │   │   │       └── page_fault_count
> > +    │   │   │   └── gtX
> > +    │   │   └── tileT
> > +    │   ├── vf1
> > +    │   │   ├── device -> ../../../BDF+1
> > +    │   │   ├── stop
> > +    │   │   ├── tile0
> > +    │   │   │   ├── ggtt_quota
> > +    │   │   │   ├── lmem_quota
> > +    │   │   │   ├── gt0
> > +    │   │   │   │   ├── contexts_quota
> > +    │   │   │   │   ├── doorbells_quota
> > +    │   │   │   │   ├── exec_quantum_ms
> > +    │   │   │   │   ├── preempt_timeout_us
> > +    │   │   │   │   └── thresholds
> > +    │   │   │   │       ├── cat_error_count
> > +    │   │   │   │       ├── doorbell_time_us
> > +    │   │   │   │       ├── engine_reset_count
> > +    │   │   │   │       ├── h2g_time_us
> > +    │   │   │   │       ├── irq_time_us
> > +    │   │   │   │       └── page_fault_count
> > +    │   │   │   └── gtX
> > +    │   │   └── tileT
> > +    │   └── vfN
> > +..
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               This directory appears on the device when:
> > +
> > +                - device supports SR-IOV, and
> > +                - device is a Physical Function (PF), and
> > +                - xe driver supports SR-IOV PF on given device,
> > and
> > +                - xe driver supports automatic VFs provisioning.
> > +
> > +               This directory is used as a root for all attributes
> > related to
> > +               automatic provisioning of SR-IOV Physical Function
> > (PF) and/or
> > +               Virtual Functions (VFs).
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /enabled
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               (RW) bool (0, 1)
> > +
> > +               This file represents configuration flag for the
> > automatic VFs
> > +               (un)provisioning that could be performed by the PF.
> > +
> > +               The default value is 1 (true).
> > +
> > +               This flag can be set to false, unless manual
> > provisioning is not
> > +               applicable for given platform or it is not
> > supported by current
> > +               PF implementation. In such cases -EPERM will be
> > returned.
> > +
> > +               This flag will be automatically set to false when
> > there will be
> > +               other attempts to change any of VF's resource
> > provisioning.
> > +               See "sriov_extensions" section for details.
> > +
> > +               This flag can be set back to true if and only if
> > all VFs are
> > +               fully unprovisioned, otherwise -EEXIST error will
> > be returned.
> > +
> > +               false = "disabled"
> > +                       When disabled, then PF will not attempt to
> > do automatic
> > +                       VFs provisioning when VFs are being enabled
> > and will not
> > +                       perform automatic unprovisioning of the VFs
> > when VFs will
> > +                       be disabled.
> > +
> > +               true = "enabled"
> > +                       When enabled, then on VFs enabling PF will
> > do automatic
> > +                       VFs provisioning based on the default
> > settings described
> > +                       below.
> > +
> > +                       If automatic VFs provisioning fails due to
> > some reasons,
> > +                       then VFs will not be enabled.
> > +
> > +                       If enabled, all resources allocated during
> > VFs enabling
> > +                       will be released during VFs disabling
> > (automatic unprovisioning).
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /admin_mode
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               (RW) bool (0, 1)
> > +
> > +               This file represents configuration flag for the
> > automatic VFs
> > +               provisioning that could be performed by the PF.
> > +
> > +               The default value depends on the platform type.
> > +
> > +               This flag can be changed any time, but will have no
> > effect if
> > +               VFs are already provisioned.
> > +
> > +               If enabled (default on discrete platforms) then the
> > PF will
> > +               retain only minimum hardcoded resources for its own
> > use when
> > +               doing VFs automatic provisioning and will not use
> > any default
> > +               values described below for its own configuration.
> > +
> > +               If disabled (default on integrated platforms) then
> > the PF will
> > +               treat itself like yet another additional VF in all
> > fair resource
> > +               allocations and will also try to apply default
> > provisioning
> > +               values described below for its own configuration.
> > +
> 
> One alternative could be to expose two sets of defaults, the PF and
> VF 
> ones. With the advantage of allowing the "admin mode" / "minimal PF"
> to 
> be explicitly configurable instead of hardcoded. Should be more
> flexible.
> 
> If the discrete vs integrated distinction is wanted it could simply
> be 
> made by initialy populating (driver init) the respective defaults
> based 
> on the platform type.
> 
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /reset_defaults
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               (WO) bool (1)
> > +
> > +               Writing to this file will reset all default
> > provisioning parameters
> > +               listed below to the default values.
> 
> Maybe this isn't required if you can say it is the responsibility of 
> whoever changes the defaults to either know what they are doing, or
> to 
> save and restore themselves if they. It is not a major concern but if
> writing kernel code can be avoided perhaps it can be considered.
> 
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /resources/default_contexts_quota
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /resources/default_doorbells_quota
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /resources/default_ggtt_quota
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /resources/default_lmem_quota
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /scheduling/default_exec_quantum_ms
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /scheduling/default_preempt_timeout_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /monitoring/default_cat_error_count
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /monitoring/default_doorbell_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /monitoring/default_engine_reset_count
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /monitoring/default_h2g_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /monitoring/default_irq_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_auto_provisioning
> > /monitoring/default_page_fault_count
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               These files represent default provisioning that
> > should be used
> > +               for VFs automatic provisioning.
> > +
> > +               These values can be changed any time, but will have
> > no effect if
> > +               VFs are already provisioned.
> > +
> > +               default_contexts_quota: (RW) integer 0..U32_MAX
> > +                       The number of GuC context IDs to provide to
> > the VF.
> > +                       The default value is 0 (use fair
> > allocations).
> > +                       See
> > "sriov_extensions/vfN/tileT/gtX/contexts_quota" for details.
> > +
> > +               default_doorbells_quota: (RW) integer 0..U32_MAX
> > +                       The number of GuC doorbells to provide to
> > the VF.
> > +                       The default value is 0 (use fair
> > allocations).
> > +                       See
> > "sriov_extensions/vfN/tileT/gtX/doorbells_quota" for details.
> > +
> > +               default_ggtt_quota: (RW) integer 0..U32_MAX
> > +                       The size of the GGTT address space (in
> > bytes) to provide to the VF.
> > +                       The default value is 0 (use fair
> > allocations).
> > +                       See "sriov_extensions/vfN/tileT/ggtt_quota"
> > for details.
> > +
> > +               default_lmem_quota: (RW) integer 0..U32_MAX
> > +                       The size of the LMEM (in bytes) to provide
> > to the VF.
> > +                       The default value is 0 (use fair
> > allocations).
> > +                       See "sriov_extensions/vfN/tileT/lmem_quota"
> > for details.
> > +
> > +               default_exec_quantum_ms: (RW) integer 0..U32_MAX
> > +                       The GT execution quantum (in millisecs)
> > assigned to the function.
> > +                       The default value is 0 (infinify).
> > +                       See
> > "sriov_extensions/vfN/tileT/gtX/exec_quantum_ms" for details.
> > +
> > +               default_preempt_timeout_us: (RW) integer 0..U32_MAX
> > +                       The GT preemption timeout (in microsecs)
> > assigned to the function.
> > +                       The default value is 0 (infinity).
> > +                       See
> > "sriov_extensions/vfN/tileT/gtX/preempt_timeout_us" for details.
> 
> I have a slight concern here on the usability of GuC specific
> tunables.
> 
> Whereas one can imagine an external entity (some admin, somewhere) to
> probably pretty much understand what it means to partition the local 
> memory, address space, and set the scheduling timeouts (all are
> intuitve 
> and obvious concepts), how are they suppose to approach the GuC 
> doorbells and contexts?
> 
> It could be a matter of adding more documentation for those two, or
> it 
> even could make sense to shove them under a guc prefix (or 
> subdirectory?) to signify the fact they are implementation details
> and 
> not a fundamental concept.
> 
> > +
> > +               default_cat_error_count: (RW) integer 0..U32_MAX
> > +               default_doorbell_time_us: (RW) integer 0..U32_MAX
> > +               default_engine_reset_count: (RW) integer 0..U32_MAX
> > +               default_h2g_time_us: (RW) integer 0..U32_MAX
> > +               default_irq_time_us: (RW) integer 0..U32_MAX
> > +               default_page_fault_count: (RW) integer 0..U32_MAX
> > +                       The monitoring threshold to be set for the
> > function.
> > +                       The default value is 0 (don't monitor).
> > +                       See
> > "sriov_extensions/vfN/tileT/gtX/thresholds" for details.
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               This directory appears on Xe device when:
> > +
> > +                - device supports SR-IOV, and
> > +                - device is a Physical Function (PF), and
> > +                - driver is enabled to support SR-IOV PF on given
> > device.
> > +
> > +               This directory is used as a root for all attributes
> > required to
> > +               manage both Physical Function (PF) and Virtual
> > Functions (VFs).
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/strict
> > _scheduling_enabled
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               (RW) bool
> > +
> > +               This file represents a flag used to determine if
> > scheduling
> > +               parameters should be respected even if there is no
> > active
> > +               workloads submitted by the PF or VFs.
> > +
> > +               This flag is disabled by default, unless strict
> > scheduling is
> > +               not applicable on given platform. In such case this
> > file will
> > +               be read-only.
> > +
> > +               The change to this file may have no effect if VFs
> > are not yet enabled.
> > +               If strict scheduling can't be enabled in GuC then
> > write will fail with -EIO.
> 
> I think the semantics of this need to be documented ie. how it
> interacts 
> with exec_quantum_ms. If it does? I am guessing that it has to
> otherwise 
> I don't know what it would mean - presumably unused timeslices are
> not 
> given to other entities but time just goes wasted? But it is also a 
> question on over what time interval. Or that too is purely defined by
> the number of PF+VFs and their respective allocated quanta.
> 
> Also, would there be benefit, assuming it is possible with GuC, to
> allow 
> configuring it per PF/VF?
> 
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/monito
> > ring_period_ms
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               (RW) integer
> > +
> > +               This file represents the configuration knob used by
> > adverse event
> > +               monitoring. A value here is the period in millisecs
> > during which
> > +               events are counted and the total is checked against
> > a threshold.
> > +               See "sriov_extensions/vfN/tileT/gtX/thresholds" for
> > more details.
> > +
> > +               Default is 0 (monitoring is disabled).
> > +
> > +               If monitoring capability is not available, then
> > attempt to enable
> > +               will fail with -EPERM error. If monitoring can't be
> > enabled in
> > +               GuC then write will fail with -EIO.
> 
> Could the docs explain if there is a downside to enabling it, which
> is 
> probably why it isn't enabled by default? Because it does sound
> natural 
> that adverse events should be noticed.
> 
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               This directory holds all attributes related to the
> > SR-IOV
> > +               Physical Function (PF).
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               This directory holds all attributes related to the
> > SR-IOV
> > +               Virtual Function (VF).
> > +
> > +               Note that VF numbers (N) are 1-based as described
> > in PCI SR-IOV specification.
> > +               The Xe driver implementaton follows that naming
> > schema.
> > +
> > +               There will be "vf1", "vf2" up to "vfN" directories,
> > where N matches
> > +               value of the PCI "sriov_totalvfs" attribute.
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               This directory holds all SR-IOV attributes related
> > to the device tile.
> > +               The tile numbers (T) start from 0.
> > +
> > +               There is at least one "tile0/" directory present.
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               This directory holds all SR-IOV attributes related
> > to the device GT.
> > +               The GT numbers (X) start from 0.
> > +
> > +               There is at least one "gt0/" directory present.
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/dev
> > ice
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/de
> > vice
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               (symbolic link)
> > +
> > +               Backlink to the PCI device entry representing given
> > function.
> > +               For PF this link is always present.
> > +               For VF this link is present only for currently
> > enabled VFs.
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/pri
> > ority
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               (RW) string
> > +
> > +               This file represents a GuC Scheduler knob to
> > override the default
> > +               round-robin or FIFO scheduler policies implemented
> > by the GuC.
> > +
> > +               The default value is "peer".
> > +
> > +               This flag can be changed, unless such change is not
> > applicable
> > +               for given platform or is not supported by current
> > GuC firmware.
> > +               In such case this file could be read-only or will
> > return -EPERM
> > +               on write attempt.
> > +
> > +               "immediate"
> > +                       GuC will Schedule PF workloads immediately
> > and PF
> > +                       workloads only until the PF's work queues
> > in GuC
> > +                       are empty.
> > +
> > +               "lazy"
> > +                       GuC will Schedule PF workloads at the next
> > opportune
> > +                       moment and PF workloads only until the PF
> > work queues
> > +                       in GuC are empty.
> > +
> > +               "peer"
> > +                       GuC Scheduler will treat PF and VFs with
> > equal priority.
> 
> Hmmm this is too very GuC specific and I wonder what is the usecase
> for 
> lazy? Lazy = "don't care when it runs, but when it runs it will run 
> everything queued so far", right? Feels a bit odd on first.
> 
> "Immediate" may also not be depending on preemption granularity and 
> workloads, right?
> 
> Are there any ideas to express the knobs in a more generic fashion?
> 
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/st
> > op
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               (WO) bool (1)
> > +
> > +               Write to this file will force GuC to stop handle
> > any requests from
> > +               this VF, but without triggering a FLR.
> > +               To recover, the full FLR must be issued using
> > generic "device/reset".
> > +
> > +               This file allows to implement custom policy
> > mechanism when VF is
> > +               misbehaving and triggering adverse events above
> > defined thresholds.
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX/exec_quantum_ms
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX/preempt_timeout_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/exec_quantum_ms
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/preempt_timeout_us
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               These files represent scheduling parameters of the
> > functions.
> > +
> > +               These scheduling parameters can be changed even if
> > VFs are enabled
> > +               and running, unless such change is not applicable
> > on given platform
> > +               due to fixed hardware or firmware assignment.
> > +
> > +               exec_quantum_ms: (RW) integer 0..U32_MAX
> > +                       The GT execution quantum in [ms] assigned
> > to the function.
> > +                       Requested quantum might be aligned per
> > HW/FW requirements.
> > +
> > +                       Default is 0 (unlimited).
> > +
> > +               preempt_timeout_us: (RW) integer 0..U32_MAX
> > +                       The GT preemption timeout in [us] assigned
> > to the function.
> > +                       Requested timeout might be aligned per
> > HW/FW requirements.
> > +
> > +                       Default is 0 (unlimited).
> 
> Alignment for the above two will be visible after read-back?
> 
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/ggtt_quota
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/lmem_quota
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/contexts_quota
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/doorbells_quota
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               These files represent shared resource assigned to
> > the functions.
> > +
> > +               These resource parameters can be changed, unless VF
> > is already running,
> > +               or such change is not applicable on given platform
> > due to fixed hardware
> > +               or firmware assignment.
> > +
> > +               Writes to these attributes may fail with:
> > +                       -EPERM if change is not applicable on give
> > HW/FW.
> > +                       -E2BIG if value larger that HW/FW limit.
> > +                       -EDQUOT if value is larger than maximum
> > quota defined by the PF.
> > +                       -ENOSPC if PF can't allocate required
> > quota.
> > +                       -EBUSY if the resource is currently in use
> > by the VF.
> > +                       -EIO if GuC refuses to change provisioning.
> 
> Why it would refuse if input is valid? In other words, what is the 
> user/admin supposed to do on -EIO?
> 
> > +
> > +               ggtt_quota: (RW) integer 0..U64_MAX
> > +                       The size of the GGTT address space (in
> > bytes) assigned to the VF.
> > +                       The value might be aligned per HW/FW
> > requirements.
> > +
> > +                       Default is 0 (unprovisioned).
> > +
> > +               lmem_quota: (RW) integer 0..U64_MAX
> > +                       The size of the Local Memory (in bytes)
> > assigned to the VF.
> > +                       The value might be aligned per HW/FW
> > requirements.
> > +
> > +                       This attribute is only available on
> > discrete platforms.
> > +
> > +                       Default is 0 (unprovisioned).
> > +
> > +               contexts_quota: (RW) 0..U16_MAX
> > +                       The number of GuC submission contexts
> > assigned to the VF.
> > +                       This value might be aligned per HW/FW
> > requirements.
> > +
> > +                       Default is 0 (unprovisioned).
> > +
> > +               doorbells_quota: (RW) 0..U16_MAX
> > +                       The number of GuC doorbells assigned to the
> > VF.
> > +                       This value might be aligned per HW/FW
> > requirements.
> > +
> > +                       Default is 0 (unprovisioned).
> > +
> > +
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX/thresholds/cat_error_count
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX/thresholds/doorbell_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX/thresholds/engine_reset_count
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX/thresholds/h2g_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX/thresholds/irq_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/pf/til
> > eT/gtX/thresholds/page_fault_count
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/thresholds/cat_error_count
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/thresholds/doorbell_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/thresholds/engine_reset_count
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/thresholds/h2g_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/thresholds/irq_time_us
> > +What:          /sys/bus/pci/drivers/xe/.../sriov_extensions/vfN/ti
> > leT/gtX/thresholds/page_fault_count
> > +Date:          2024
> > +KernelVersion: TBD
> > +Contact:       intel-xe at lists.freedesktop.org
> > +Description:
> > +               These files represent threshold values used by the
> > GuC to trigger
> > +               security events if adverse event monitoring is
> > enabled.
> 
> How are the security events delivered? There is mention of uevents in
> a 
> later paragraph - are they already defined or should be together with
> this so the link can be place here?
> 
> > +
> > +               These thresholds are checked every
> > "monitoring_period_ms".
> > +               Refer to GuC ABI for details about each threshold
> > category.
> 
> Is it possible to have a link here to GuC ABI?
> 
> Regards,
> 
> Tvrtko
> 
> > +
> > +               Default value for all thresholds is 0 (disabled).
> > +
> > +               cat_error_count: (RW) integer
> > +               doorbell_time_us: (RW) integer
> > +               engine_reset_count: (RW) integer
> > +               h2g_time_us: (RW) integer
> > +               irq_time_us: (RW) integer
> > +               page_fault_count: (RW) integer
> > diff --git a/Documentation/gpu/rfc/xe_sriov.rst
> > b/Documentation/gpu/rfc/xe_sriov.rst
> > new file mode 100644
> > index 000000000000..574f6414eabb
> > --- /dev/null
> > +++ b/Documentation/gpu/rfc/xe_sriov.rst
> > @@ -0,0 +1,192 @@
> > +.. SPDX-License-Identifier: MIT
> > +
> > +========================
> > +Xe – SR-IOV Support Plan
> > +========================
> > +
> > +The Single Root I/O Virtualization (SR-IOV) extension to the PCI
> > Express (PCIe)
> > +specification suite is supported starting from 12th generation of
> > Intel Graphics
> > +processors.
> > +
> > +This document describes planned ABI of the new Xe driver (see
> > xe.rst) that will
> > +provide flexible configuration and management options related to
> > the SR-IOV.
> > +It will also highlight few most important changes to the Xe driver
> > +implementation to deal with Intel GPU SR-IOV specific
> > requirements.
> > +
> > +
> > +SR-IOV Capability
> > +=================
> > +
> > +Due to SR-IOV complexity and required co-operation between
> > hardware, firmware
> > +and kernel drivers, not all Xe architecture platforms might have
> > SR-IOV enabled
> > +or fully functional.
> > +
> > +To control at the driver level which platform will provide support
> > for SR-IOV,
> > +as we can't just rely on the PCI configuration data exposed by the
> > hardware,
> > +we will introduce "has_sriov" flag to the struct xe_device_desc
> > that describes
> > +a device capabilities that driver checks during the probe.
> > +
> > +Initially this flag will be set to disabled even on platforms that
> > we plan to
> > +support. We will enable this flag only once we finish merging all
> > required
> > +changes to the driver and related validated firmwares are also
> > made available.
> > +
> > +
> > +SR-IOV Platforms
> > +================
> > +
> > +Initially we plan to add SR-IOV functionality to the following SDV
> > platforms
> > +already supported by the Xe driver:
> > +
> > + - TGL (up to 7 VFs)
> > + - ADL (up to 7 VFs)
> > + - MTL (up to 7 VFs)
> > + - ATSM (up to 31 VFs)
> > + - PVC (up to 63 VFs)
> > +
> > +Newer platforms will be supported later, but we hope that enabling
> > will be
> > +much faster, as majority of the driver changes are either platform
> > agnostic
> > +or are similar between earlier platforms (hence we start with
> > SDVs).
> > +
> > +
> > +PF Mode
> > +=======
> > +
> > +Support in the driver for acting in Physical Function (PF) mode,
> > i.e. mode
> > +that allows configuration of VFs, depends on the CONFIG_PCI_IOV
> > and will be
> > +enabled by default.
> > +
> > +However, due to potentially conflicting requirements for SR-IOV
> > and other mega
> > +features, we might want to have an option to disable SR-IOV PF
> > mode support at
> > +the driver load time.
> > +
> > +Thus, we plan to use additional modparam named "sriov_totalvfs"
> > which if set to
> > +0 will force the driver to operate in the native (non-virtualized)
> > mode.
> > +The same modparam could be used to limit number of supported
> > Virtual Functions
> > +(VFs) by the driver compared to the hardware limit exposed in PCI
> > configuration.
> > +
> > +The name of this modparam corresponds to the existing PCI sysfs
> > attribute, that
> > +by default exposes hardware capability.
> > +
> > +The default value of this param will allow to support all possible
> > VFs as
> > +claimed by the hardware.
> > +
> > +This modparam will have no effect if driver is running on the VF
> > device.
> > +
> > +
> > +VFs Enabling
> > +============
> > +
> > +To enable or disable VFs we plan to rely on existing sysfs
> > attribute exposed by
> > +the PCI subsystem named "sriov_numvfs". We will provide all
> > necessary tweaks to
> > +provision VFs in our custom implementation of the
> > "sriov_configure" hook from
> > +the struct pci_driver.
> > +
> > +If for some reason, including explicit request to disable SR-IOV
> > PF mode using
> > +modparam, we will not be able to correctly support any VFs, driver
> > will change
> > +number of supported VFs, exposed to the userspace by
> > "sriov_totalvfs" attribute,
> > +to 0, thus preventing configuration of the VFs.
> > +
> > +
> > +VF Mode
> > +=======
> > +
> > +When driver is running on the VF device, then due to hardware
> > enforcements,
> > +access to the privileged registers is not possible. To avoid
> > relying on these
> > +registers, we plan to perform early detection if we are running on
> > the VF
> > +device using dedicated VF_CAP(0x1901f8) register and then use
> > global macro
> > +IS_SRIOV_VF(xe) to control the driver logic.
> > +
> > +To speed up merging of the required changes, we might first
> > introduce dummy
> > +macro that is always set to false, to prepare driver to avoid some
> > code paths
> > +before we finalize our VF mode detection and other VFs enabling
> > changes.
> > +
> > +
> > +Resources
> > +=========
> > +
> > +Most of the hardware (or firmware) resources available on the Xe
> > architecture,
> > +like GGTT, LMEM, GuC context IDs, GuC doorbells, will be shared
> > between PF and
> > +VFs and will require some provisioning steps to assign those
> > resources for use
> > +by the VF.
> > +
> > +Until VFs are provisioned with resources, the PF driver will be
> > able to use all
> > +resources, in the same way as it would be running in non-
> > virtualized mode.
> > +
> > +If some resource (of part or region of it) is assigned to specific
> > VF, then PF
> > +is not allowed to use that part or region of the resource, but can
> > continue to
> > +use whatever is left available.
> > +
> > +Those resources are usually fully virtualized, so they will not
> > require any
> > +special handling when used by the VF driver, except that VF driver
> > must know
> > +the assigned quota.
> > +
> > +The most notable exception is the GGTT address space, as on some
> > platforms,
> > +the VF driver must additionally know the real range that it can
> > access.
> > +
> > +Once the resources were assigned to the VF use and the VF driver
> > has started,
> > +then it is not allowed to change such provisioning, as that would
> > break the
> > +VF driver. To make changes the VF driver, which was using these
> > resources,
> > +must be unloaded (or the VM is terminated) and the VF device must
> > be reset
> > +using the FLR.
> > +
> > +
> > +Scheduling
> > +==========
> > +
> > +The workloads from PF driver and VF drivers must be submitted to
> > the hardware
> > +always by using the GuC submission mechanism. Unless VF has
> > exclusive access
> > +to the GT then submissions from different VFs are time-sliced and
> > controlled
> > +with additional "execution_quantum" and "preemption_timeout"
> > parameters.
> > +
> > +In contrast to the resource provisioning, those scheduling
> > parameters can be
> > +changed even if VF drivers are already running and are active.
> > +
> > +
> > +Automatic VFs Provisioning
> > +==========================
> > +
> > +To provide out-of-the box experience when user will be enabling
> > VFs using
> > +generic "sriov_numvfs" attribute without requiring complex
> > provisioning steps,
> > +the SR-IOV PF driver will implement automatic VFs resource
> > provisioning.
> > +
> > +By default, all VFs will be allocated with the fair amount of the
> > mandatory
> > +resources (like GGTT, GuC IDs) and with unrestricted scheduling
> > parameters.
> > +Such provisioning should be sufficient for most of the normal
> > usages, when
> > +no strict SLA is required.
> > +
> > +The PF driver will also expose some additional sysfs files to
> > allow adjusting
> > +this automatic VFs provisioning, like default values for most of
> > the
> > +provisioning parameters that PF will then apply for each enabled
> > VF.
> > +
> > +    Details about those extension can be found in
> > +    :download:`Preliminary Xe driver ABI <sysfs-driver-xe-sriov>`.
> > +
> > +
> > +Manual VFs Provisioning
> > +=======================
> > +
> > +If automatic VFs provisioning, which applies same configuration to
> > every VF,
> > +is not sufficient or there is a need for advanced customization of
> > some VF,
> > +the PF driver will also provide extended sysfs interface which
> > will allow
> > +control every provisioning attribute to the lowest feasible level.
> > +
> > +It is expected that these low-level attributes will be mostly used
> > by the
> > +advanced users or by the custom tools that will setup
> > configurations that
> > +meet predefined and validated SLA as required by the customers.
> > +
> > +    Details about those extension can be found in
> > +    :download:`Preliminary Xe driver ABI <sysfs-driver-xe-sriov>`.
> > +
> > +
> > +VFs Monitoring
> > +==============
> > +
> > +In addition to the resource provisioning or changing scheduling
> > parameters,
> > +the PF driver might also allow configure some monitoring
> > parameters, like
> > +thresholds of adverse events or sample period, to track undesired
> > behavior
> > +of the VFs that could impact the whole system.
> > +
> > +Once those thresholds are setup and sampling period is defined,
> > the GuC will
> > +notify the PF driver about which VF is excessing the threshold and
> > then PF is
> > +able to trigger the uevent to notify the administrator (or VMM)
> > that could
> > +take some action against the VF.



More information about the dri-devel mailing list