[Intel-gfx] [PATCH v3 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration

Tue Mar 27 10:26:15 UTC 2018

From: Kevin Rogovin <kevin.rogovin at intel.com>

Signed-off-by: Kevin Rogovin <kevin.rogovin at intel.com>
---
 Documentation/gpu/i915.rst      | 129 +++++++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_vma.h |  10 +++-
 2 files changed, 113 insertions(+), 26 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881..ed8e08d 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -249,6 +249,112 @@ Memory Management and Command Submission
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
+Intel GPU Basics
+----------------
+
+An Intel GPU has multiple engines. There are several engine types.
+The user-space value `I915_EXEC_DEFAULT` is an alias to the user
+space value `I915_EXEC_RENDER`.
+
+- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
+
+The Intel GPU family is a familiy of integrated GPU's using Unified
+Memory Access. For having the GPU "do work", user space will feed the
+GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
+or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR` (the ioctl `DRM_IOCTL_I915_GEM_EXECBUFFER`
+is deprecated). Most such batchbuffers will instruct the GPU to perform
+work (for example rendering) and that work needs memory from which to
+read and memory to which to write. All memory is encapsulated within GEM
+buffer objects (usually created with the ioctl `DRM_IOCTL_I915_GEM_CREATE`).
+An ioctl providing a batchbuffer for the GPU to create will also list
+all GEM buffer objects that the batchbuffer reads and/or writes. For
+implementation details of memory management see
+`GEM BO Management Implementation Details`_.
+
+A GPU pipeline (mostly strongly so for the RCS engine) has a great deal
+of state which is to be programmed by user space via the contents of a
+batchbuffer. Starting in Gen6 (SandyBridge), hardware contexts are
+supported. A hardware context encapsulates GPU pipeline state and other
+portions of GPU state and it is much more efficient for the GPU to load
+a hardware context instead of re-submitting commands in a batchbuffer to
+the GPU to restore state. In addition, using hardware contexts provides
+much better isolation between user space clients. The ioctl
+`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` is used by user space to create a
+hardware context which is identified by a 32-bit integer. The
+non-deprecated ioctls to submit batchbuffer work can pass that ID (in
+the lower bits of drm_i915_gem_execbuffer2::rsvd1) to identify what HW
+context to use with the command. When the kernel submits the batchbuffer
+to be executed by the GPU it will also instruct the GPU to load the HW
+context prior to executing the contents of a batchbuffer.
+
+The GPU has its own memory management and address space. The kernel
+driver maintains the memory translation table for the GPU. For older
+GPUs (i.e. those before Gen8), there is a single global such translation
+table, a global Graphics Translation Table (GTT). For newer generation
+GPUs each hardware context has its own translation table, called
+Per-Process Graphics Translation Table (PPGTT). Of important note, is
+that although PPGTT is named per-process it is actually per hardware
+context. When user space submits a batchbuffer, the kernel walks the
+list of GEM buffer objects used by the batchbuffer and guarantees that
+not only is the memory of each such GEM buffer object resident but it
+is also present in the (PP)GTT. If the GEM buffer object is not yet
+placed in the (PP)GTT, then it is given an address. Two consequences
+of this are: the kernel needs to edit the batchbuffer submitted to
+write the correct value of the GPU address when a GEM BO is assigned a
+GPU address and the kernel might evict a different GEM BO from the
+(PP)GTT to make address room for a GEM BO.
+
+Consequently, the ioctls submitting a batchbuffer for execution also
+include a list of all locations within buffers that refer to
+GPU-addresses so that the kernel can edit the buffer correctly. This
+process is dubbed relocation. The ioctls allow user space to provide to
+the kernel a presumed offset for each GEM buffer object used in a
+batchbuffer. If the kernel sees that the address provided by user space
+is correct, then it skips performing relocation for that GEM buffer
+object. In addition, the kernel provides to what addresses the kernel
+relocates each GEM buffer object.
+
+There is also an interface for user space to directly specify the
+address location of GEM BO's, the feature soft-pinning and made active
+within an execbuffer2 ioctl with `EXEC_OBJECT_PINNED` bit up. If
+user-space also specifies `I915_EXEC_NO_RELOC`, then the kernel is to
+not execute any relocation and user-space manages the address space for
+its PPGTT itself. The advantage of user space handling address space is
+that then the kernel does far less work and user space can safely assume
+that GEM buffer object's location in GPU address space do not change.
+
+GEM BO Management Implementation Details
+----------------------------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h
+   :doc: Virtual Memory Address
+
+Buffer Object Eviction
+----------------------
+
+This section documents the interface functions for evicting buffer
+objects to make space available in the virtual gpu address spaces. Note
+that this is mostly orthogonal to shrinking buffer objects caches, which
+has the goal to make main memory (shared with the gpu through the
+unified memory architecture) available.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
+   :internal:
+
+Buffer Object Memory Shrinking
+------------------------------
+
+This section documents the interface function for shrinking memory usage
+of buffer object caches. Shrinking is used to make main memory
+available. Note that this is mostly orthogonal to evicting buffer
+objects, which has the goal to make space in gpu virtual address spaces.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
+   :internal:
+
 Batchbuffer Parsing
 -------------------
 
@@ -312,29 +418,6 @@ Object Tiling IOCTLs
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c
    :doc: buffer object tiling
 
-Buffer Object Eviction
-----------------------
-
-This section documents the interface functions for evicting buffer
-objects to make space available in the virtual gpu address spaces. Note
-that this is mostly orthogonal to shrinking buffer objects caches, which
-has the goal to make main memory (shared with the gpu through the
-unified memory architecture) available.
-
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
-   :internal:
-
-Buffer Object Memory Shrinking
-------------------------------
-
-This section documents the interface function for shrinking memory usage
-of buffer object caches. Shrinking is used to make main memory
-available. Note that this is mostly orthogonal to evicting buffer
-objects, which has the goal to make space in gpu virtual address spaces.
-
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
-   :internal:
-
 GuC
 ===
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 8c50220..0000f23 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -38,9 +38,13 @@
 enum i915_cache_level;
 
 /**
- * A VMA represents a GEM BO that is bound into an address space. Therefore, a
- * VMA's presence cannot be guaranteed before binding, or after unbinding the
- * object into/from the address space.
+ * DOC: Virtual Memory Address
+ *
+ * An `i915_vma` struct represents a GEM BO that is bound into an address
+ * space. Therefore, a VMA's presence cannot be guaranteed before binding, or
+ * after unbinding the object into/from the address space. The struct includes
+ * the bookkepping details needed for tracking it in all the lists with which
+ * it interacts.
  *
  * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
  * will always be <= an objects lifetime. So object refcounting should cover us.
-- 
2.7.4