[Intel-gfx] [RFC PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan
Matthew Brost
matthew.brost at intel.com
Wed May 26 23:33:57 UTC 2021
Add entry for i915 new parallel submission uAPI plan.
v2:
(Daniel Vetter):
- Expand logical order explaination
- Add dummy header
- Only allow N BBs in execbuf IOCTL
- Configure parallel submission per slot not per gem context
v3:
(Marcin Ślusarz):
- Lot's of typos / bad english fixed
(Tvrtko Ursulin):
- Consistent pseudo code, clean up wording in descriptions
Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Cc: Tony Ye <tony.ye at intel.com>
CC: Carl Zhang <carl.zhang at intel.com>
Cc: Daniel Vetter <daniel.vetter at intel.com>
Cc: Jason Ekstrand <jason at jlekstrand.net>
Signed-off-by: Matthew Brost <matthew.brost at intel.com>
---
Documentation/gpu/rfc/i915_parallel_execbuf.h | 145 ++++++++++++++++++
Documentation/gpu/rfc/i915_scheduler.rst | 55 ++++++-
2 files changed, 198 insertions(+), 2 deletions(-)
create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h b/Documentation/gpu/rfc/i915_parallel_execbuf.h
new file mode 100644
index 000000000000..20de206e3ab4
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
@@ -0,0 +1,145 @@
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
+
+/*
+ * i915_context_engines_parallel_submit:
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs. Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL know how
+ * many BBs there are based on the slots configuration. The N BBs are the last N
+ * buffer objects for first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * There are two currently defined ways to control the placement of the
+ * hardware contexts on physical engines: default behavior (no flags) and
+ * I915_PARALLEL_IMPLICIT_BONDS (a flag). More flags may be added the in the
+ * future as new hardware / use cases arise. Details of how to use this
+ * interface above the flags field in this structure.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * inteface.
+ */
+struct i915_context_engines_parallel_submit {
+ struct i915_user_extension base;
+
+ __u16 engine_index; /* slot for parallel engine */
+ __u16 width; /* number of contexts per parallel engine */
+ __u16 num_siblings; /* number of siblings per context */
+ __u16 mbz16;
+/*
+ * Default placement behavior (currently unsupported):
+ *
+ * Allow BBs to be placed on any available engine instance. In this case each
+ * context's engine mask indicates where that context can be placed. It is
+ * implied in this mode that all contexts have mutual exclusive placement.
+ * e.g. If one context is running CSX[0] no other contexts can run on CSX[0]).
+ *
+ * Example 1 pseudo code:
+ * CSX,Y[N] = generic engine class X or Y, logical instance N
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ * engines=CSX[0],CSX[1],CSY[0],CSY[1])
+ *
+ * Results in the following valid placements:
+ * CSX[0], CSY[0]
+ * CSX[0], CSY[1]
+ * CSX[1], CSY[0]
+ * CSX[1], CSY[1]
+ *
+ * This can also be thought of as 2 virtual engines described by 2-D array in
+ * the engines the field:
+ * VE[0] = CSX[0], CSX[1]
+ * VE[1] = CSY[0], CSY[1]
+ *
+ * Example 2 pseudo code:
+ * CSX[Y] = generic engine of same class X, logical instance N
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=3,
+ * engines=CSX[0],CSX[1],CSX[2],CSX[0],CSX[1],CSX[2])
+ *
+ * Results in the following valid placements:
+ * CSX[0], CSX[1]
+ * CSX[0], CSX[2]
+ * CSX[1], CSX[0]
+ * CSX[1], CSX[2]
+ * CSX[2], CSX[0]
+ * CSX[2], CSX[1]
+ *
+ * This can also be thought of as 2 virtual engines described by 2-D array in
+ * the engines the field:
+ * VE[0] = CSX[0], CSX[1], CSX[2]
+ * VE[1] = CSX[0], CSX[1], CSX[2]
+
+ * This enables a use case where all engines are created equally, we don't care
+ * where they are scheduled, we just want a certain number of resources, for
+ * those resources to be scheduled in parallel, and possibly across multiple
+ * engine classes.
+ */
+
+/*
+ * I915_PARALLEL_IMPLICIT_BONDS - Create implicit bonds between each context.
+ * Each context must have the same number of sibling and bonds are implicitly
+ * created between each set of siblings.
+ *
+ * Example 1 pseudo code:
+ * CSX[N] = generic engine of same class X, logical instance N
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=1,
+ * engines=CSX[0],CSX[1], flags=I915_PARALLEL_IMPLICIT_BONDS)
+ *
+ * Results in the following valid placements:
+ * CSX[0], CSX[1]
+ *
+ * Example 2 pseudo code:
+ * CSX[N] = generic engine of same class X, logical instance N
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ * engines=CSX[0],CSX[2],CSX[1],CSX[3],
+ * flags=I915_PARALLEL_IMPLICIT_BONDS)
+ *
+ * Results in the following valid placements:
+ * CSX[0], CSX[1]
+ * CSX[2], CSX[3]
+ *
+ * This can also be thought of as 2 virtual engines described by 2-D array in
+ * the engines the field with bonds placed between each index of the virtual
+ * engines. e.g. CSX[0] is bonded to CSX[1], CSX[2] is bonded to CSX[3].
+ * VE[0] = CSX[0], CSX[2]
+ * VE[1] = CSX[1], CSX[3]
+ *
+ * This enables a use case where all engines are not equal and certain placement
+ * rules are required (i.e. split-frame requires all contexts to be placed in a
+ * logically contiguous order on the VCS engines on gen11+ platforms). This use
+ * case (logically contiguous placement, within a single engine class) is
+ * supported when using GuC submission. Execlist mode could support all possible
+ * bonding configurations but currently doesn't support this extension.
+ */
+#define I915_PARALLEL_IMPLICIT_BONDS (1 << 0)
+/*
+ * Do not allow BBs to be preempted mid BB rather insert coordinated preemption
+ * points on all hardware contexts between each set of BBs. An example use case
+ * of this feature is split-frame on gen11+ hardware.
+ */
+#define I915_PARALLEL_NO_PREEMPT_MID_BATCH (1 << 1)
+#define __I915_PARALLEL_UNKNOWN_FLAGS (-(I915_PARALLEL_NO_PREEMPT_MID_BATCH << 1))
+ __u64 flags; /* all undefined flags must be zero */
+ __u64 mbz64[3]; /* reserved for future use; must be zero */
+
+ /*
+ * 2-D array of engines
+ *
+ * width (i) * num_siblings (j) in length
+ * index = j + i * num_siblings
+ */
+ struct i915_engine_class_instance engines[0];
+} __attribute__ ((packed));
+
diff --git a/Documentation/gpu/rfc/i915_scheduler.rst b/Documentation/gpu/rfc/i915_scheduler.rst
index 7faa46cde088..0254c04d34be 100644
--- a/Documentation/gpu/rfc/i915_scheduler.rst
+++ b/Documentation/gpu/rfc/i915_scheduler.rst
@@ -23,7 +23,7 @@ i915 with the DRM scheduler is:
severe design issues in general, which is why we want to retire it no
matter what
* New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step
- which configures a slot with N contexts
+ which configures a slot with N contexts
* After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to
a slot in a single execbuf IOCTL and the batches run on the GPU in
paralllel
@@ -82,4 +82,55 @@ https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priorit
New parallel submission uAPI
============================
-Details to come in a following patch.
+The existing bonding uAPI is completely broken with GuC submission because
+whether a submission is a single context submit or parallel submit isn't known
+until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
+contexts in parallel with the GuC the context must be explicitly registered with
+N contexts and all N contexts must be submitted in a single command to the GuC.
+The GuC interfaces do not support dynamically changing between N contexts as the
+bonding uAPI does. Hence the need for a new parallel submission interface. Also
+the legacy bonding uAPI is quite confusing and not intuitive at all.
+
+The new parallel submission uAPI consists of 3 parts:
+
+* Export engines logical mapping
+* A 'set_parallel' extension to configure contexts for parallel
+ submission
+* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
+
+Export engines logical mapping
+------------------------------
+Certain use cases require BBs to be placed on engine instances in logical order
+(e.g. split-frame on gen11+). The logical mapping of engine instances can change
+based on fusing. Rather than making UMDs be aware of fusing, simply expose the
+logical mapping with the existing query engine info IOCTL. Also the GuC
+submission interface currently only supports submitting multiple contexts to
+engines in logical order which is a new requirement compared to execlists.
+Lastly, all current platforms have at most 2 engine instances and the logical
+order is the same as uAPI order. This will change on platforms with more than 2
+engine instances.
+
+A single bit will be added to drm_i915_engine_info.flags indicating that the
+logical instance has been returned and a new field,
+drm_i915_engine_info.logical_instance, returns the logical instance.
+
+A 'set_parallel' extension to configure contexts for parallel submission
+------------------------------------------------------------------------
+The 'set_parallel' extension configures a slot for parallel submission of N BBs.
+It is setup step that should be called before using any of the contexts. See
+I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
+similar existing examples. Once a slot is configured for parallel submission the
+execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only
+support GuC submission. Execlist support can be added later if needed.
+
+Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
+i915_context_engines_parallel_submit to the uAPI to implement this extension.
+
+Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
+-------------------------------------------------------------------
+Contexts that have been configured with the 'set_parallel' extension are allowed
+to submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N
+objects in the drm_i915_gem_exec_object2 list or the first N if
+I915_EXEC_BATCH_FIRST is set. The number of BBs is implict based on the slot
+submitted and how it has been configured by 'set_parallel' or other extensions.
+No uAPI changes are required to execbuf2 IOCTL.
--
2.28.0
More information about the Intel-gfx
mailing list