[Mesa-dev] [PATCH 07/20] i965/fs: Import image memory offset calculation code.

Thu Jul 23 17:05:56 PDT 2015

Ok, I've looked through this again and convinced myself that it's
*mostly* correct.  I am a bit skeptical of the address swizzling for
Y-major tiling.

I've also included some comments that I'd like to see added (assuming
they're correct).  Sometimes it's easier to write helpful comments if
the person who is confused does the writing. :-)

Also, I'd just like to say that I'm terribly impressed after reading
through this.  You managed to handle an amazing number of cases
without so much as a single if statement or predicated instruction.
My hat's off to you sir.

Modulo Y-tiling and comments,

Reviewed-by: Jason Ekstrand <jason.ekstrand at intel.com>

On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Define a function to calculate the memory address of the image
> location given by a vector of coordinates.  This is required in cases
> where we need to fall back to untyped surface access, which take a raw
> memory offset and know nothing about surface coordinates, type
> conversion or memory tiling and swizzling.  They are still useful
> because typed surface reads don't support any 64 or 128-bit formats on
> IVB, and they don't support any 128-bit formats on HSW and BDW.
>
> The tiling algorithm is implemented based on a number of parameters
> which are passed in as uniforms and determine whether the surface
> layout is X-tiled, Y-tiled or untiled.  This allows binding surfaces
> of different tiling layouts to the pipeline without recompiling the
> program.
>
> v2: Drop VEC4 suport.
> v3: Rebase.
> ---
>  .../drivers/dri/i965/brw_fs_surface_builder.cpp    | 108 +++++++++++++++++++++
>  1 file changed, 108 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
> index 5ee04de..0c879db 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
> @@ -215,4 +215,112 @@ namespace {
>           return BRW_PREDICATE_NORMAL;
>        }
>     }
> +
> +   namespace image_coordinates {
> +      /**
> +       * Calculate the offset in memory of the texel given by \p coord.
> +       *
> +       * This is meant to be used with untyped surface messages to access a
> +       * tiled surface, what involves taking into account the tiling and
> +       * swizzling modes of the surface manually so it will hopefully not
> +       * happen very often.
> +       */
> +      fs_reg
> +      emit_address_calculation(const fs_builder &bld, const fs_reg &image,
> +                               const fs_reg &coord, unsigned dims)
> +      {
> +         const brw_device_info *devinfo = bld.shader->devinfo;
> +         const fs_reg off = offset(image, bld, BRW_IMAGE_PARAM_OFFSET_OFFSET);
> +         const fs_reg stride = offset(image, bld, BRW_IMAGE_PARAM_STRIDE_OFFSET);
> +         const fs_reg tile = offset(image, bld, BRW_IMAGE_PARAM_TILING_OFFSET);
> +         const fs_reg swz = offset(image, bld, BRW_IMAGE_PARAM_SWIZZLING_OFFSET);
> +         const fs_reg addr = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
> +         const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
> +         const fs_reg minor = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
> +         const fs_reg major = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
> +         const fs_reg dst = bld.vgrf(BRW_REGISTER_TYPE_UD);
> +
> +         /* Shift the coordinates by the fixed surface offset. */
> +         for (unsigned c = 0; c < 2; ++c)
> +            bld.ADD(offset(addr, bld, c), offset(off, bld, c),
> +                    (c < dims ?
> +                     offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, c) :
> +                     fs_reg(0)));
> +
> +         if (dims > 2) {
> +            /* Decompose z into a major (tmp.y) and a minor (tmp.x)
> +             * index.

Please add a comment:
/*
The layout of 3-D textures in memory is sort-of like a tiling format.
At each miplevel, the slices are arranged in rows of 2^level slices
per row.  The slice row is stored in tmp.y and the slice within the
row is stored in tmp.x.  The layout of 2-D array textures is much
simpler.  Each slice of the texture is storred as a 2-D texture with
all its miplevels and then one qpitch later, the next slice is stored
with its miplevels.  This code handles both by passing in the miplevel
as tile[2] for 3-D textures and 0 in tile[2] for 2-D array textures.
*/
> +             */
> +            bld.BFE(offset(tmp, bld, 0), offset(tile, bld, 2), fs_reg(0),
> +                    offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, 2));
> +            bld.SHR(offset(tmp, bld, 1),
> +                    offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, 2),
> +                    offset(tile, bld, 2));
> +
> +            /* Take into account the horizontal (tmp.x) and vertical (tmp.y)
> +             * slice offset.
> +             */
> +            for (unsigned c = 0; c < 2; ++c) {
> +               bld.MUL(offset(tmp, bld, c),
> +                       offset(stride, bld, 2 + c), offset(tmp, bld, c));
> +               bld.ADD(offset(addr, bld, c),
> +                       offset(addr, bld, c), offset(tmp, bld, c));
> +            }
> +         }
> +
> +         if (dims > 1) {

Please add as a comment:
/*
Calculate the major/minor x and y indices.  In order to accommodate
both X and Y tiling, the Y-major tiling format is treated as being a
bunch of narrow X-tiles placed next to each other.   This means that
the tile width for Y-tiling is actually the width of one sub-column of
the Y-major tile where each 4K tile has 8 512B sub-columns.

The major Y value is the row of tiles in which the pixel lives.  The
major X value is the tile sub-column in which the pixel lives; for X
tiling, this is the same as the tile column, for Y tiling, each tile
has 8 sub-columns.  The minor X and Y indices are the position within
the sub-column.
*/
> +            for (unsigned c = 0; c < 2; ++c) {
> +               /* Calculate the minor x and y indices. */
> +               bld.BFE(offset(minor, bld, c), offset(tile, bld, c),
> +                       fs_reg(0), offset(addr, bld, c));
> +
> +               /* Calculate the major x and y indices. */
> +               bld.SHR(offset(major, bld, c),
> +                       offset(addr, bld, c), offset(tile, bld, c));
> +            }
> +
> +            /* Calculate the texel index from the start of the tile row and
> +             * the vertical coordinate of the row.
> +             * Equivalent to:
> +             *   tmp.x = (major.x << tile.y << tile.x) +
> +             *           (minor.y << tile.x) + minor.x
> +             *   tmp.y = major.y << tile.y
> +             */
> +            bld.SHL(tmp, major, offset(tile, bld, 1));
> +            bld.ADD(tmp, tmp, offset(minor, bld, 1));
> +            bld.SHL(tmp, tmp, offset(tile, bld, 0));
> +            bld.ADD(tmp, tmp, minor);
> +            bld.SHL(offset(tmp, bld, 1),
> +                    offset(major, bld, 1), offset(tile, bld, 1));
> +
> +            /* Add it to the start of the tile row. */
> +            bld.MUL(offset(tmp, bld, 1),
> +                    offset(tmp, bld, 1), offset(stride, bld, 1));
> +            bld.ADD(tmp, tmp, offset(tmp, bld, 1));
> +
> +            /* Multiply by the Bpp value. */
> +            bld.MUL(dst, tmp, stride);
> +
> +            if (devinfo->gen < 8 && !devinfo->is_baytrail) {

Talking to Ken a bit about this. It seems as if this is actually the
check we want.  It's going to be a nasty bug if someone has a shader
cache on a HSW machine, runs some compute shaders, pulls out a stick
of ram (going from dual to single-channel), and starts it up again.
Probably best to compile the shader the same regardless of how much
ram you have.

> +               /* Take into account the two dynamically specified shifts. */
> +               for (unsigned c = 0; c < 2; ++c)
> +                  bld.SHR(offset(tmp, bld, c), dst, offset(swz, bld, c));
> +
> +               /* XOR tmp.x and tmp.y with bit 6 of the memory address. */
> +               bld.XOR(tmp, tmp, offset(tmp, bld, 1));
> +               bld.AND(tmp, tmp, fs_reg(1 << 6));
> +               bld.XOR(dst, dst, tmp);

I don't think this is correct for Y-tiled surfaces.  In Y-tiling, bit
6 of the address is given by bit6 XOR bit9.  You have one extra XOR
which means you just get just bit9.

> +            }
> +
> +         } else {
> +            /* Multiply by the Bpp/stride value. */

I'm still confused as to how addr can have 2 things while dims == 1.
Does dims specify the per-slice dimension or total dimension?  That
should probably be documented at the top of the function.

> +            bld.MUL(offset(addr, bld, 1),
> +                    offset(addr, bld, 1), offset(stride, bld, 1));
> +            bld.ADD(addr, addr, offset(addr, bld, 1));
> +            bld.MUL(dst, addr, stride);
> +         }
> +
> +         return dst;
> +      }
> +   }
>  }
> --
> 2.4.3
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev