[Mesa-dev] [PATCH 2/4] intel/isl: Add support for RGB formats in X and Y-tiled memory

Fri Sep 9 19:05:55 UTC 2016

On Fri 09 Sep 2016, Jason Ekstrand wrote:
> Normally, using a non-linear tiling format helps improve cache locality by
> ensuring that neighboring pixels are usually close-by in memory.  For RGB
> formats, this still sort-of holds, but it can also lead to rather terrible
> memory access patterns where a single RGB pixel value crosses a tile
> boundary and gets split into two pieces in different 4K pages.  It also
> makes for some rather awkward calculations because your tile size is no
> longer an even multiple of surface element size.  For these reasons, we
> chose to simply never create tiled RGB images in the Vulkan driver.
> 
> The GL driver, however, is not so kind so we need to support it somehow.  I
> briefly toyed with a couple of different schemes but this is the best one I
> could come up with.  The fundamental problem is that a tile no longer
> contains an integer number of surface elements.  I briefly considered a
> couple other options but found them wanting:
> 
>  1) Using floats for the logical tile size.  This leads to potential
>     rounding error problems.

Let's stay far away from floats With floats, there is just too much risk
for weird bugs.

Floats are for representing ℝ. Our calculations are in ℚ, so let's
restrict ourselves to ints.  (Does GMail display the blackboard bold
UTF-8 correctly?)

>  2) When presented with a RGB format, just make the tile 3-times as wide.
>     This isn't so nice because now our tiles are no longer power-of-two
>     size.  Also, it can force the row_pitch to be larger than needed which,
>     while not strictly a problem for ISL, causes incompatibility problems
>     with the way the GL driver chooses surface pitches.
> 
> The chosen method requires that you pay attention and not just assume that
> your tile_info is in the units you think it is.  However, it's nice because
> it provides a nice "these are the units" declaration in isl_tile_info
> itself.  Previously, the tile_info wasn't usable as a stand-alone structure
> because you had to also know the format.  It also forces figuring out how
> to deal with inconsistencies between tiling and format back to the caller
> which is good because the two different consumers of isl_tile_info really
> want to deal with it differently:  Computation of the surface size wants
> the fewest number of horizontal tiles possible while get_intratile_offset
> is far more concerned with things aligning nicely.
> 
> Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
> Cc: Chad Versace <chad at kiwitree.net>
> ---
>  src/intel/isl/isl.c | 49 ++++++++++++++++++++++++++++++++++++-------------
>  src/intel/isl/isl.h | 10 +++++++++-
>  2 files changed, 45 insertions(+), 14 deletions(-)
> 
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index 34245f4..8c72186 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev,
>     const uint32_t bs = format_bpb / 8;
>     struct isl_extent2d logical_el, phys_B;
>  
> -   assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb));
> +   if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) {
-----------------------------------------^^^
1. Weirdo space after '!'

> +      /* It is possible to have non-power-of-two formats in a tiled buffer.
> +       * The easiest way to handle this is to treat the tile as if it is three
> +       * times as wide.  This way no pixel will ever cross a tile boundary.
> +       * This really only works on legacy X and Y tiling formats.
> +       */
> +      assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0);
> +      assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3));
> +      return isl_tiling_get_info(dev, tiling, format_bpb / 3, tile_info);
> +   }
>  
>     switch (tiling) {
>     case ISL_TILING_LINEAR:
> @@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev,
>  
>     *tile_info = (struct isl_tile_info) {
>        .tiling = tiling,
> +      .format_bpb = format_bpb,
>        .logical_extent_el = logical_el,
>        .phys_extent_B = phys_B,
>     };
> @@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev,
>        }
>        base_alignment = isl_round_up_to_power_of_two(base_alignment);
>     } else {
> +      assert(fmtl->bpb % tile_info.format_bpb == 0);
> +      const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb;

> +      assert(tile_el_scale == 1 || tile_el_scale == 3);

2. Without the context of this patch, the assertion is very mysterious.
Future people will be afraid to touch it, as there's no clue to the
magic 3's justification. Please add a comment.

> +
>        assert(phys_slice0_sa.w % fmtl->bw == 0);
>        const uint32_t total_w_el = phys_slice0_sa.width / fmtl->bw;
>        const uint32_t total_w_tl =
> -         isl_align_div(total_w_el, tile_info.logical_extent_el.width);
> +         isl_align_div(total_w_el * tile_el_scale,
> +                       tile_info.logical_extent_el.width);
>  
>        row_pitch = total_w_tl * tile_info.phys_extent_B.width;
>        if (row_pitch < info->min_pitch) {
> -         row_pitch = isl_align(info->min_pitch, tile_info.phys_extent_B.width);
> +         row_pitch = isl_align_npot(info->min_pitch,
> +                                    tile_info.phys_extent_B.width);
>        }
>  
>        total_h_el += isl_align_div_npot(pad_bytes, row_pitch);
> @@ -1678,14 +1694,6 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev,
>                                     uint32_t *x_offset_el,
>                                     uint32_t *y_offset_el)
>  {
> -   /* This function only really works for power-of-two surfaces.  In
> -    * theory, we could make it work for non-power-of-two surfaces by going
> -    * to the left until we find a block that is bs-aligned.  The Vulkan
> -    * driver doesn't use non-power-of-two tiled surfaces so we'll leave
> -    * this unimplemented for now.
> -    */
> -   assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(bs));
> -
>     if (tiling == ISL_TILING_LINEAR) {
>        *base_address_offset = total_y_offset_el * row_pitch +
>                               total_x_offset_el * bs;
> @@ -1694,8 +1702,24 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev,
>        return;
>     }
>  
> +   const uint32_t bpb = bs * 8;
> +
>     struct isl_tile_info tile_info;
> -   isl_tiling_get_info(dev, tiling, bs * 8, &tile_info);
> +   isl_tiling_get_info(dev, tiling, bpb, &tile_info);
> +
> +   assert(row_pitch % tile_info.phys_extent_B.width == 0);
> +
> +   /* For non-power-of-two formats, we need the address to be both tile and
> +    * element-aligned.  The easiest way to achieve this is to work with a tile
> +    * that is three times as wide as the regular tile.
> +    *
> +    * The tile info returned by get_tile_info has a logical size that is an
> +    * integer number of tile_info.format_bpb size elements.  To scale the
> +    * tile, we scale up the physical width and then treat the logical tile
> +    * size as if it has bpb size elements.
> +    */
> +   const uint32_t tile_el_scale = bpb / tile_info.format_bpb;
> +   tile_info.phys_extent_B.width *= tile_el_scale;
>  
>     /* Compute the offset into the tile */
>     *x_offset_el = total_x_offset_el % tile_info.logical_extent_el.w;
> @@ -1705,7 +1729,6 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev,
>     uint32_t x_offset_tl = total_x_offset_el / tile_info.logical_extent_el.w;
>     uint32_t y_offset_tl = total_y_offset_el / tile_info.logical_extent_el.h;
>  
> -   assert(row_pitch % tile_info.phys_extent_B.width == 0);
>     *base_address_offset =
>        y_offset_tl * tile_info.phys_extent_B.h * row_pitch +
>        x_offset_tl * tile_info.phys_extent_B.h * tile_info.phys_extent_B.w;
> diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
> index 36ef81c..92a694a 100644
> --- a/src/intel/isl/isl.h
> +++ b/src/intel/isl/isl.h
> @@ -728,7 +728,15 @@ struct isl_format_layout {
>  struct isl_tile_info {
>     enum isl_tiling tiling;
>  
> -   /** The logical size of the tile in units of surface elements
> +   /* The size (in bits per block) of a single surface element
> +    *
> +    * For surfaces with power-of-two formats, this is the same as
> +    * isl_format_layout::bpb.  For non-power-of-two formats it may be smaller.
> +    * The logical_extent_el field is in terms of elements of this size.

3. It would be very nice to insert an example here for
ISL_FORMAT_R32G32B32_FLOAT, to help the future people that will
inevitably be confused by this surprising resizing. Here's a suggestion:

    For example, consider ISL_FORMAT_R32G32B32_FLOAT, for which
    isl_format_layout::bpb is 96 (a non-power-of-two). When calculating
    surface layout, isl wants to work only with power-of-two bpb. So isl
    splits the pixel into 3 surface elements, one for each channel, and
    isl_tile_info::format_bpb is 32.

> +    */
> +   uint32_t format_bpb;
> +
> +   /** The logical size of the tile in units of format_bpb size elements
>      *
>      * This field determines how a given surface is cut up into tiles.  It is
>      * used to compute the size of a surface in tiles and can be used to

Address feedback #1 and #2, and this gets
Acked-by: Chad Versace <chadversary at chromium.org>

Just an ack, because I didn't comb through all of isl to verify that
this was safe.

Address feedback #3 and you get bonus points.