Mesa (main): docs/isl: Add detailed documentation about CCS compression
GitLab Mirror
gitlab-mirror at kemper.freedesktop.org
Fri Jun 18 13:29:58 UTC 2021
Module: Mesa
Branch: main
Commit: b97dedd365fbd8c2e62e0fecc89d01cfc38eb0e6
URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=b97dedd365fbd8c2e62e0fecc89d01cfc38eb0e6
Author: Jason Ekstrand <jason at jlekstrand.net>
Date: Tue Jun 15 16:57:25 2021 -0500
docs/isl: Add detailed documentation about CCS compression
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11366>
---
docs/isl/ccs.rst | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++
docs/isl/index.rst | 1 +
2 files changed, 172 insertions(+)
diff --git a/docs/isl/ccs.rst b/docs/isl/ccs.rst
new file mode 100644
index 00000000000..37797705cc9
--- /dev/null
+++ b/docs/isl/ccs.rst
@@ -0,0 +1,171 @@
+Single-sampled Color Compression
+================================
+
+Starting with Ivy Bridge, Intel graphics hardware provides a form of color
+compression for single-sampled surfaces. In its initial form, this provided an
+acceleration of render target clear operations that, in the common case, allows
+you to avoid almost all of the bandwidth of a full-surface clear operation. On
+Sky Lake, single-sampled color compression was extended to allow for the
+compression color values from actual rendering and not just the initial clear.
+From here on, the older Ivy Bridge form of color compression will be called
+"fast-clears" and term "color compression" will be reserved for the more
+powerful Sky Lake form.
+
+The documentation for Ivy Bridge through Broadwell overloads the term MCS for
+referring both to the *multisample control surface* used for multisample
+compression and the control surface used for fast-clears. In ISL, the
+:cpp:enumerator:`isl_aux_usage::ISL_AUX_USAGE_MCS` enum always refers to
+multisample color compression while the
+:cpp:enumerator:`isl_aux_usage::ISL_AUX_USAGE_CCS_` enums always refer to
+single-sampled color compression. Throughout this chapter and the rest of the
+ISL documentation, we will use the term "color control surface", abbreviated
+CCS, to denote the control surface used for both fast-clears and color
+compression. While this is still an overloaded term, Ivy Bridge fast-clears
+are much closer to Sky Lake color compression than they are to multisample
+compression.
+
+CCS data
+--------
+
+Fast clears and CCS are possibly the single most poorly documented aspect of
+surface layout/setup for Intel graphics hardware (with HiZ coming in a neat
+second). All the documentation really says is that you can use an MCS buffer on
+single-sampled surfaces (we will call it the CCS in this case). It also
+provides some documentation on how to program the hardware to perform clear
+operations, but that's it. How big is this buffer? What does it contain?
+Those question are left as exercises to the reader. Almost everything we know
+about the contents of the CCS is gleaned from reverse-engineering of the
+hardware. The best bit of documentation we have ever had comes from the
+display section of the Sky Lake PRM Vol 12 section on planes (p. 159):
+
+ The Color Control Surface (CCS) contains the compression status of the
+ cache-line pairs. The compression state of the cache-line pair is
+ specified by 2 bits in the CCS. Each CCS cache-line represents an area
+ on the main surface of 16x16 sets of 128 byte Y-tiled cache-line-pairs.
+ CCS is always Y tiled.
+
+While this is technically for color compression and not fast-clears, it
+provides a good bit of insight into how color compression and fast-clears
+operate. Each cache-line pair, in the main surface corresponds to 1 or 2 bits
+in the CCS. The primary difference, as far as the current discussion is
+concerned, is that fast-clears use only 1 bit per cache-line pair whereas color
+compression uses 2 bits.
+
+What is a cache-line pair? Both the X and Y tiling formats are arranged as an
+8x8 grid of cache lines. (See the [chapter on tiling](#tiling) for more
+details.) In either case, a cache-line pair is a pair of cache lines whose
+starting addresses differ by 512 bytes or 8 cache lines. This results in the
+two cache lines being vertically adjacent when the main surface is X-tiled and
+horizontally adjacent when the main surface is Y-tiled. For an X-tiled surface
+this forms an area of 64B x 2rows and for a Y-tiled surface this forms an area
+of 32B x 4rows. In either case, it is guaranteed that, regardless of surface
+format, each 2x2 subspan coming out of a shader will land entirely within one
+cache-line pair.
+
+What is the correspondence between bits and cache-line pairs? The best model I
+(Jason) know of is to consider the CCS as having a 1-bit color format for
+fast-clears and a 2-bit format for color compression and a special tiling
+format. The CCS tiling formats operate on a 1 or 2-bit granularity rather than
+the byte granularity of most tiling formats.
+
+The following table represents the bit-layouts that yield the CCS tiling format
+on different hardware generations. Bits 0-11 correspond to the regular swizzle
+of bytes within a 4KB page whereas the negative bits represent the address of
+the particular 1 or 2-bit portion of a byte. (Note: The haswell data was
+gathered on a dual-channel system so bit-6 swizzling was enabled. It's unclear
+how this affects the CCS layout.)
+
+============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
+ Generation Tiling 11 10 9 8 7 6 5 4 3 2 1 0 -1 -2 -3
+============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
+ Ivy Bridge X or Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0`
+ Haswell X :math:`u_6` :math:`u_5` :math:`v_3 \oplus u_1` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_0`
+ Haswell Y :math:`u_6` :math:`u_5` :math:`v_2 \oplus u_1` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_0`
+ Broadwell X :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`u_3` :math:`v_3` :math:`u_2` :math:`u_1` :math:`u_0` :math:`v_2` :math:`v_1` :math:`v_0`
+ Broadwell Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`u_3` :math:`u_2` :math:`u_1` :math:`v_1` :math:`v_0` :math:`u_0`
+ Sky Lake Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_3` :math:`v_2` :math:`v_1` :math:`u_3` :math:`u_2` :math:`u_1` :math:`v_0` :math:`u_0`
+============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
+
+CCS surface layout
+------------------
+
+Starting with Broadwell, fast-clears and color compression can be used on
+mipmapped and array surfaces. When considered from a higher level, the CCS is
+layed out like any other surface. The Broadwell and Sky Lake PRMs describe
+this as follows:
+
+Broadwell PRM Vol 7, "MCS Buffer for Render Target(s)" (p. 676):
+
+ Mip-mapped and arrayed surfaces are supported with MCS buffer layout with
+ these alignments in the RT space: Horizontal Alignment = 256 and Vertical
+ Alignment = 128.
+
+Broadwell PRM Vol 2d, "RENDER_SURFACE_STATE" (p. 279):
+
+ For non-multisampled render target's auxiliary surface, MCS, QPitch must be
+ computed with Horizontal Alignment = 256 and Surface Vertical Alignment =
+ 128. These alignments are only for MCS buffer and not for associated render
+ target.
+
+Sky Lake PRM Vol 7, "MCS Buffer for Render Target(s)" (p. 632):
+
+ Mip-mapped and arrayed surfaces are supported with MCS buffer layout with
+ these alignments in the RT space: Horizontal Alignment = 128 and Vertical
+ Alignment = 64.
+
+Sky Lake PRM Vol. 2d, "RENDER_SURFACE_STATE" (p. 435):
+
+ For non-multisampled render target's CCS auxiliary surface, QPitch must be
+ computed with Horizontal Alignment = 128 and Surface Vertical Alignment
+ = 256. These alignments are only for CCS buffer and not for associated
+ render target.
+
+Empirical evidence seems to confirm this. On Sky Lake, the vertical alignment
+is always one cache line. The horizontal alignment, however, varies by main
+surface format: 1 cache line for 32bpp, 2 for 64bpp and 4 cache lines for
+128bpp formats. This nicely corresponds to the alignment of 128x64 pixels in
+the primary color surface. The second PRM citation about Sky Lake CCS above
+gives a vertical alignment of 256 rather than 64. With a little
+experimentation, this additional alignment appears to only apply to QPitch and
+not to the miplevels within a slice.
+
+On Broadwell, each miplevel in the CCS is aligned to a cache-line pair
+boundary: horizontal when the primary surface is X-tiled and vertical when
+Y-tiled. For a 32bpp format, this works out to an alignment of 256x128 main
+surface pixels regardless of X or Y tiling. On Sky Lake, the alignment is
+a single cache line which works out to an alignment of 128x64 main surface
+pixels.
+
+TODO: More than just 32bpp formats on Broadwell!
+
+Once armed with the above alignment information, we can lay out the CCS surface
+itself. The way ISL does CCS layout calculations is by a very careful and
+subtle application of its normal surface layout code.
+
+Above, we described the CCS data layout as mapping of address bits. In
+ISL, this is represented by :cpp:enumerator:`isl_tiling::ISL_TILING_CCS`. The
+logical and physical tile dimensions corresponding to the above mapping.
+
+We also have special :cpp:enum:`isl_format` enums for CCS. These formats are 1
+bit-per-pixel on Ivy Bridge through Broadwell and 2 bits-per-pixel on Skylake
+and above to correspond to the 1 and 2-bit values represented in the CCS data.
+They have a block size (similar to a block compressed format such as BC or
+ASTC) which says what area (in surface elements) in the main surface is covered
+by a single CCS element (1 or 2-bit). Because this depends on the main surface
+tiling and format, we have several different CCS formats.
+
+Once the appropriate :cpp:enum:`isl_format` has been selected, computing the
+size and layout of a CCS surface is as simple as passing the same surface
+creation parameters to :cpp:func:`isl_surf_init_s` as were used to create the
+primary surface only with :cpp:enumerator:`isl_tiling::ISL_TILING_CCS` and the
+correct CCS format. This not only results in a correctly sized surface but
+most other ISL helpers for things such as computing offsets into surfaces work
+correctly as well.
+
+CCS on Tigerlake and above
+--------------------------
+
+Starting with Tigerlake, CCS is no longer done via a surface and, instead, the
+term CCS gets overloaded once again (gotta love it!) to now refer to a form of
+universal compression which can be applied to almost any surface. Nothing in
+this chapter applies to any hardware with a graphics IP version 12 or above.
diff --git a/docs/isl/index.rst b/docs/isl/index.rst
index 2d1714a5259..d91508d6689 100644
--- a/docs/isl/index.rst
+++ b/docs/isl/index.rst
@@ -12,6 +12,7 @@ Chery.
units
formats
tiling
+ ccs
The core representation of a surface in ISL is :cpp:struct:`isl_surf`.
More information about the mesa-commit
mailing list