[Intel-gfx] [PATCH v2 1/1] i915: additional GEM documentation

Fri Mar 2 14:09:21 UTC 2018

From: Kevin Rogovin <kevin.rogovin at intel.com>

This patch provides additional overview documentation to the
i915 kernel driver GEM. In addition, it presents already written
documentation to i915.rst as well.

Signed-off-by: Kevin Rogovin <kevin.rogovin at intel.com>
---
 Documentation/gpu/i915.rst                 | 194 +++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   3 +-
 drivers/gpu/drm/i915/i915_vma.h            |  11 +-
 drivers/gpu/drm/i915/intel_lrc.c           |   3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  64 ++++++++++
 5 files changed, 235 insertions(+), 40 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881b00dc..cd23da2793ec 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -13,6 +13,18 @@ Core Driver Infrastructure
 This section covers core driver infrastructure used by both the display
 and the GEM parts of the driver.
 
+Initialization
+--------------
+
+The real action of initialization for the i915 driver is handled by
+:c:func:`i915_driver_load`; from this function one can see the key
+data (in paritcular :c:struct:'drm_driver' for GEM) of the entry points
+to to the driver from user space.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.c
+   :functions: i915_driver_load
+
+
 Runtime Power Management
 ------------------------
 
@@ -243,32 +255,148 @@ Display PLLs
 .. kernel-doc:: drivers/gpu/drm/i915/intel_dpll_mgr.h
    :internal:
 
-Memory Management and Command Submission
-========================================
+GEM: Memory Management and Command Submission
+=============================================
 
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
-Batchbuffer Parsing
--------------------
+Intel GPU Basics
+----------------
 
-.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c
-   :doc: batch buffer command parser
+An Intel GPU has multiple engines. There are several engine types.
+The user-space value `I915_EXEC_DEFAULT` is an alias to the user
+space value `I915_EXEC_RENDER`.
+
+- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
+
+The Intel GPU family is a familiy of integrated GPU's using Unified Memory
+Access. For having the GPU "do work", user space will feed the GPU batch buffers
+via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` or
+`DRM_IOCTL_I915_GEM_EXECBUFFER2_WR` (the ioctl `DRM_IOCTL_I915_GEM_EXECBUFFER`
+is deprecated). Most such batchbuffers will instruct the GPU to perform work
+(for example rendering) and that work needs memory from which to read and memory
+to which to write. All memory is encapsulated within GEM buffer objects (usually
+created with the ioctl `DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer
+for the GPU to create will also list all GEM buffer objects that the batchbuffer
+reads and/or writes. For implementation details of memory management see
+`GEM BO Management Implementation Details`_.
+
+A GPU pipeline (mostly strongly so for the RCS engine) has a great deal of state
+which is to be programmed by user space via the contents of a batchbuffer. Starting
+in Gen6 (SandyBridge), hardware contexts are supported. A hardware context
+encapsulates GPU pipeline state and other portions of GPU state and it is much more
+efficient for the GPU to load a hardware context instead of re-submitting commands
+in a batchbuffer to the GPU to restore state. In addition, using hardware contexts
+provides much better isolation between user space clients. The ioctl
+`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` is used by user space to create a hardware context
+which is identified by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer
+work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) to
+identify what HW context to use with the command. When the kernel submits the
+batchbuffer to be executed by the GPU it will also instruct the GPU to load the HW
+context prior to executing the contents of a batchbuffer.
+
+The GPU has its own memory management and address space. The kernel driver
+maintains the memory translation table for the GPU. For older GPUs (i.e. those
+before Gen8), there is a single global such translation table, a global
+Graphics Translation Table (GTT). For newer generation GPUs each hardware
+context has its own translation table, called Per-Process Graphics Translation
+Table (PPGTT). Of important note, is that although PPGTT is named per-process it
+is actually per hardware context. When user space submits a batchbuffer, the kernel
+walks the list of GEM buffer objects used by the batchbuffer and guarantees
+that not only is the memory of each such GEM buffer object resident but it is
+also present in the (PP)GTT. If the GEM buffer object is not yet placed in
+the (PP)GTT, then it is given an address. Two consequences of this are:
+the kernel needs to edit the batchbuffer submitted to write the correct
+value of the GPU address when a GEM BO is assigned a GPU address and
+the kernel might evict a different GEM BO from the (PP)GTT to make address
+room for a GEM BO.
+
+Consequently, the ioctls submitting a batchbuffer for execution also include
+a list of all locations within buffers that refer to GPU-addresses so that the
+kernel can edit the buffer correctly. This process is dubbed relocation. The
+ioctls allow user space to provide what the GPU address could be. If the kernel
+sees that the address provided by user space is correct, then it skips performing
+relocation for that GEM buffer object. In addition, the kernel provides to what
+addresses the kernel relocates each GEM buffer object.
+
+There is also an interface for user space to directly specify the address location
+of GEM BO's, the feature soft-pinning and made active within an execbuffer2 ioctl
+with `EXEC_OBJECT_PINNED` bit up. If user-space also specifies `I915_EXEC_NO_RELOC`,
+then the kernel is to not execute any relocation and user-space manages the address
+space for its PPGTT itself. The advantage of user space handling address space is
+that then the kernel does far less work and user space can safely assume that
+GEM buffer object's location in GPU address space do not change.
+
+GEM BO Management Implementation Details
+----------------------------------------
 
-.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c
+.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h
+   :doc: Virtual Memory Address
+
+Buffer Object Eviction
+~~~~~~~~~~~~~~~~~~~~~~
+
+This section documents the interface functions for evicting buffer
+objects to make space available in the virtual gpu address spaces. Note
+that this is mostly orthogonal to shrinking buffer objects caches, which
+has the goal to make main memory (shared with the gpu through the
+unified memory architecture) available.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
    :internal:
 
-Batchbuffer Pools
------------------
+Buffer Object Memory Shrinking
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
-   :doc: batch pool
+This section documents the interface function for shrinking memory usage
+of buffer object caches. Shrinking is used to make main memory
+available. Note that this is mostly orthogonal to evicting buffer
+objects, which has the goal to make space in gpu virtual address spaces.
 
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
    :internal:
 
+
+Batchbuffer Submission
+----------------------
+
+Depending on GPU generation, the i915 kernel driver will submit batchbuffers
+in one of the several ways. However, the top code logic is shared for all
+methods, see `Common: At the bottom`_ and `Common: Processing requests`_
+for details. In addition, the kernel may filter the contents of user space
+provided batchbuffers. To that end the i915 driver has a
+`Command Buffer Parser`_ and a pool from which to allocate buffers to place
+filtered user space batchbuffers, see section `Batchbuffer Pools`_.
+
+Common: At the bottom
+~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: drivers/gpu/drm/i915/intel_ringbuffer.h
+   :doc: Ringbuffers to submit batchbuffers
+
+Common: Processing requests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :doc: User command execution
+
+Batchbuffer Submission Varieties
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: drivers/gpu/drm/i915/intel_ringbuffer.h
+   :doc: Batchbuffer Submission Backend
+
+The two varieties for submitting batchbuffer to the GPU are the following.
+
+1. Batchbuffers are subbmitted directly to a ring buffer; this is the most basic way to submit batchbuffers to the GPU and is for generations strictly before Gen8.
+2. Batchbuffer are submitting via execlists are a features supported by Gen8 and new devices; the macro :c:macro:'HAS_EXECLISTS' is used to determine if a GPU supports submitting via execlists, see `Logical Rings, Logical Ring Contexts and Execlists`_.
+
 Logical Rings, Logical Ring Contexts and Execlists
---------------------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. kernel-doc:: drivers/gpu/drm/i915/intel_lrc.c
    :doc: Logical Rings, Logical Ring Contexts and Execlists
@@ -276,6 +404,24 @@ Logical Rings, Logical Ring Contexts and Execlists
 .. kernel-doc:: drivers/gpu/drm/i915/intel_lrc.c
    :internal:
 
+Command Buffer Parser
+---------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c
+   :doc: batch buffer command parser
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c
+   :internal:
+
+Batchbuffer Pools
+-----------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
+   :doc: batch pool
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
+   :internal:
+
 Global GTT views
 ----------------
 
@@ -312,28 +458,6 @@ Object Tiling IOCTLs
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c
    :doc: buffer object tiling
 
-Buffer Object Eviction
-----------------------
-
-This section documents the interface functions for evicting buffer
-objects to make space available in the virtual gpu address spaces. Note
-that this is mostly orthogonal to shrinking buffer objects caches, which
-has the goal to make main memory (shared with the gpu through the
-unified memory architecture) available.
-
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c
-   :internal:
-
-Buffer Object Memory Shrinking
-------------------------------
-
-This section documents the interface function for shrinking memory usage
-of buffer object caches. Shrinking is used to make main memory
-available. Note that this is mostly orthogonal to evicting buffer
-objects, which has the goal to make space in gpu virtual address spaces.
-
-.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
-   :internal:
 
 GuC
 ===
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8c170db8495d..6c8b8e2041f1 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -81,7 +81,8 @@ enum {
  * but this remains just a hint as the kernel may choose a new location for
  * any object in the future.
  *
- * Processing an execbuf ioctl is conceptually split up into a few phases.
+ * Processing an execbuf ioctl is handled by i915_gem_do_execbuffer() which
+ * conceptually splits up processing of an execbuf ioctl into a few phases.
  *
  * 1. Validation - Ensure all the pointers, handles and flags are valid.
  * 2. Reservation - Assign GPU address space for every object
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 8c5022095418..d0feb4f9e326 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -38,13 +38,18 @@
 enum i915_cache_level;
 
 /**
- * A VMA represents a GEM BO that is bound into an address space. Therefore, a
- * VMA's presence cannot be guaranteed before binding, or after unbinding the
- * object into/from the address space.
+ * DOC: Virtual Memory Address
+ *
+ * An `i915_vma` struct represents a GEM BO that is bound into an address
+ * space. Therefore, a VMA's presence cannot be guaranteed before binding, or
+ * after unbinding the object into/from the address space. The struct includes
+ * the bookkepping details needed for tracking it in all the lists with which
+ * it interacts.
  *
  * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
  * will always be <= an objects lifetime. So object refcounting should cover us.
  */
+
 struct i915_vma {
 	struct drm_mm_node node;
 	struct drm_i915_gem_object *obj;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 14288743909f..bc4943333090 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -34,7 +34,8 @@
  * Motivation:
  * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
  * These expanded contexts enable a number of new abilities, especially
- * "Execlists" (also implemented in this file).
+ * "Execlists" (also implemented in this file,
+ * drivers/gpu/drm/i915/intel_lrc.c).
  *
  * One of the main differences with the legacy HW contexts is that logical
  * ring contexts incorporate many more things to the context's state, like
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index bbacf4d0f4cb..390f63479565 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -300,6 +300,70 @@ struct intel_engine_execlists {
 
 #define INTEL_ENGINE_CS_MAX_NAME 8
 
+/**
+ * DOC: Ringbuffers to submit batchbuffers
+ *
+ * At the lowest level, submitting work to a GPU engine is to add commands to
+ * a ringbuffer. A ringbuffer in the kernel driver is essentially a location
+ * from which the GPU reads its next command. To avoid copying the contents
+ * of a batchbuffer in order to submit it, the GPU has native hardware support
+ * to perform commands specified in another buffer; the command to do so is
+ * a batchbuffer start and the i915 kernel driver uses this to avoid copying
+ * batchbuffers to the ringbuffer. At the very bottom of the stack, the i915
+ * adds the following to a ringbuffer to submit a batchbuffer to the GPU.
+ *
+ * 1. Add a batchbuffer start command to the ringbuffer.
+ *      The start command is essentially a token together with the GPU
+ *      address of the batchbuffer to be executed
+ *
+ * 2. Add a pipeline flush to the the ring buffer.
+ *      This is accomplished by the function pointer
+ *
+ * 3. Add a register write command to the ring buffer.
+ *      This register write writes the the request ID,
+ *      ``i915_request::global_seqno``; the i915 kernel driver uses
+ *      the value in the register to know what requests are completed.
+ *
+ * 4. Add a user interrupt command to the ringbuffer.
+ *      This command instructs the GPU to issue an interrupt
+ *      when the command (and pipeline flush) are completed.
+ */
+
+/**
+ * DOC: Batchbuffer Submission Backend
+ *
+ * The core logic of submitting a batchbuffer for the GPU to execute
+ * is shared across all engines for all GPU generations. Through the use
+ * of functions pointers, we can customize submission to different GPU
+ * capabilities. The struct ``intel_engine_cs`` has the following member
+ * function pointers for the following purposes in the scope of batchbuffer
+ * submission.
+ *
+ * - context_pin
+ *     pins the context and also returns to  what ``intel_ringbuffer``
+ *     to write to submit a batchbuffer.
+ *
+ * - request_alloc
+ *     is used to reserve space in an ``intel_ringbuffer``
+ *     for submitting a batchbuffer to the GPU.
+ *
+ * - emit_flush
+ *     writes a pipeline flush command to the ring buffer.
+ *
+ * - emit_bb_start
+ *     writes the batchbuffer start command to the ringer buffer.
+ *
+ * - emit_breadcrumb
+ *     writes to the ring buffer both the regiser write of the
+ *     request ID (`i915_request::global_seqno`) and the command to
+ *     issue an interrupt.
+ *
+ * - submit_request
+ *     See the comment on this member in ``intel_engine_cs``, declared
+ *     in intel_ringbuffer.h.
+ *
+ */
+
 struct intel_engine_cs {
 	struct drm_i915_private *i915;
 	char name[INTEL_ENGINE_CS_MAX_NAME];
-- 
2.16.2