[Mesa-dev] [PATCH 07/20] i965/fs: Import image memory offset calculation code.

Fri Jul 24 03:59:30 PDT 2015

Jason Ekstrand <jason at jlekstrand.net> writes:

> Ok, I've looked through this again and convinced myself that it's
> *mostly* correct.  I am a bit skeptical of the address swizzling for
> Y-major tiling.
>
> I've also included some comments that I'd like to see added (assuming
> they're correct).  Sometimes it's easier to write helpful comments if
> the person who is confused does the writing. :-)
>

Thanks!  This will save me quite some guesswork, I'll add them to the
patch.

> Also, I'd just like to say that I'm terribly impressed after reading
> through this.  You managed to handle an amazing number of cases
> without so much as a single if statement or predicated instruction.
> My hat's off to you sir.
>
Hah, I feel flattered. :)

> Modulo Y-tiling and comments,
>
> Reviewed-by: Jason Ekstrand <jason.ekstrand at intel.com>
>
> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>> Define a function to calculate the memory address of the image
>> location given by a vector of coordinates.  This is required in cases
>> where we need to fall back to untyped surface access, which take a raw
>> memory offset and know nothing about surface coordinates, type
>> conversion or memory tiling and swizzling.  They are still useful
>> because typed surface reads don't support any 64 or 128-bit formats on
>> IVB, and they don't support any 128-bit formats on HSW and BDW.
>>
>> The tiling algorithm is implemented based on a number of parameters
>> which are passed in as uniforms and determine whether the surface
>> layout is X-tiled, Y-tiled or untiled.  This allows binding surfaces
>> of different tiling layouts to the pipeline without recompiling the
>> program.
>>
>> v2: Drop VEC4 suport.
>> v3: Rebase.
>> ---
>>  .../drivers/dri/i965/brw_fs_surface_builder.cpp    | 108 +++++++++++++++++++++
>>  1 file changed, 108 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> index 5ee04de..0c879db 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> @@ -215,4 +215,112 @@ namespace {
>>           return BRW_PREDICATE_NORMAL;
>>        }
>>     }
>> +
>> +   namespace image_coordinates {
>> +      /**
>> +       * Calculate the offset in memory of the texel given by \p coord.
>> +       *
>> +       * This is meant to be used with untyped surface messages to access a
>> +       * tiled surface, what involves taking into account the tiling and
>> +       * swizzling modes of the surface manually so it will hopefully not
>> +       * happen very often.
>> +       */
>> +      fs_reg
>> +      emit_address_calculation(const fs_builder &bld, const fs_reg &image,
>> +                               const fs_reg &coord, unsigned dims)
>> +      {
>> +         const brw_device_info *devinfo = bld.shader->devinfo;
>> +         const fs_reg off = offset(image, bld, BRW_IMAGE_PARAM_OFFSET_OFFSET);
>> +         const fs_reg stride = offset(image, bld, BRW_IMAGE_PARAM_STRIDE_OFFSET);
>> +         const fs_reg tile = offset(image, bld, BRW_IMAGE_PARAM_TILING_OFFSET);
>> +         const fs_reg swz = offset(image, bld, BRW_IMAGE_PARAM_SWIZZLING_OFFSET);
>> +         const fs_reg addr = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
>> +         const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
>> +         const fs_reg minor = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
>> +         const fs_reg major = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
>> +         const fs_reg dst = bld.vgrf(BRW_REGISTER_TYPE_UD);
>> +
>> +         /* Shift the coordinates by the fixed surface offset. */
>> +         for (unsigned c = 0; c < 2; ++c)
>> +            bld.ADD(offset(addr, bld, c), offset(off, bld, c),
>> +                    (c < dims ?
>> +                     offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, c) :
>> +                     fs_reg(0)));
>> +
>> +         if (dims > 2) {
>> +            /* Decompose z into a major (tmp.y) and a minor (tmp.x)
>> +             * index.
>
> Please add a comment:
> /*
> The layout of 3-D textures in memory is sort-of like a tiling format.
> At each miplevel, the slices are arranged in rows of 2^level slices
> per row.  The slice row is stored in tmp.y and the slice within the
> row is stored in tmp.x.  The layout of 2-D array textures is much
> simpler.  Each slice of the texture is storred as a 2-D texture with
> all its miplevels and then one qpitch later, the next slice is stored
> with its miplevels.  This code handles both by passing in the miplevel
> as tile[2] for 3-D textures and 0 in tile[2] for 2-D array textures.
> */

Note that depending on the ARYSPC_LOD0 surface state flag array textures
may also be interleaved in the opposite way -- All slices for LOD0, then
all slices for LOD1, and so on.  We probably don't hit this case
currently because MS images are not exposed but it's handled in the same
way by setting stride[3] to the spacing between slices.  I'll add that
to your comment.

>> +             */
>> +            bld.BFE(offset(tmp, bld, 0), offset(tile, bld, 2), fs_reg(0),
>> +                    offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, 2));
>> +            bld.SHR(offset(tmp, bld, 1),
>> +                    offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, 2),
>> +                    offset(tile, bld, 2));
>> +
>> +            /* Take into account the horizontal (tmp.x) and vertical (tmp.y)
>> +             * slice offset.
>> +             */
>> +            for (unsigned c = 0; c < 2; ++c) {
>> +               bld.MUL(offset(tmp, bld, c),
>> +                       offset(stride, bld, 2 + c), offset(tmp, bld, c));
>> +               bld.ADD(offset(addr, bld, c),
>> +                       offset(addr, bld, c), offset(tmp, bld, c));
>> +            }
>> +         }
>> +
>> +         if (dims > 1) {
>
> Please add as a comment:
> /*
> Calculate the major/minor x and y indices.  In order to accommodate
> both X and Y tiling, the Y-major tiling format is treated as being a
> bunch of narrow X-tiles placed next to each other.   This means that
> the tile width for Y-tiling is actually the width of one sub-column of
> the Y-major tile where each 4K tile has 8 512B sub-columns.
>
> The major Y value is the row of tiles in which the pixel lives.  The
> major X value is the tile sub-column in which the pixel lives; for X
> tiling, this is the same as the tile column, for Y tiling, each tile
> has 8 sub-columns.  The minor X and Y indices are the position within
> the sub-column.
> */
>> +            for (unsigned c = 0; c < 2; ++c) {
>> +               /* Calculate the minor x and y indices. */
>> +               bld.BFE(offset(minor, bld, c), offset(tile, bld, c),
>> +                       fs_reg(0), offset(addr, bld, c));
>> +
>> +               /* Calculate the major x and y indices. */
>> +               bld.SHR(offset(major, bld, c),
>> +                       offset(addr, bld, c), offset(tile, bld, c));
>> +            }
>> +
>> +            /* Calculate the texel index from the start of the tile row and
>> +             * the vertical coordinate of the row.
>> +             * Equivalent to:
>> +             *   tmp.x = (major.x << tile.y << tile.x) +
>> +             *           (minor.y << tile.x) + minor.x
>> +             *   tmp.y = major.y << tile.y
>> +             */
>> +            bld.SHL(tmp, major, offset(tile, bld, 1));
>> +            bld.ADD(tmp, tmp, offset(minor, bld, 1));
>> +            bld.SHL(tmp, tmp, offset(tile, bld, 0));
>> +            bld.ADD(tmp, tmp, minor);
>> +            bld.SHL(offset(tmp, bld, 1),
>> +                    offset(major, bld, 1), offset(tile, bld, 1));
>> +
>> +            /* Add it to the start of the tile row. */
>> +            bld.MUL(offset(tmp, bld, 1),
>> +                    offset(tmp, bld, 1), offset(stride, bld, 1));
>> +            bld.ADD(tmp, tmp, offset(tmp, bld, 1));
>> +
>> +            /* Multiply by the Bpp value. */
>> +            bld.MUL(dst, tmp, stride);
>> +
>> +            if (devinfo->gen < 8 && !devinfo->is_baytrail) {
>
> Talking to Ken a bit about this. It seems as if this is actually the
> check we want.  It's going to be a nasty bug if someone has a shader
> cache on a HSW machine, runs some compute shaders, pulls out a stick
> of ram (going from dual to single-channel), and starts it up again.
> Probably best to compile the shader the same regardless of how much
> ram you have.
>
Hmm, I guess we could also invalidate the cache if the flag changes?
(or in case anything else changes from the brw_device_info structure)

>> +               /* Take into account the two dynamically specified shifts. */
>> +               for (unsigned c = 0; c < 2; ++c)
>> +                  bld.SHR(offset(tmp, bld, c), dst, offset(swz, bld, c));
>> +
>> +               /* XOR tmp.x and tmp.y with bit 6 of the memory address. */
>> +               bld.XOR(tmp, tmp, offset(tmp, bld, 1));
>> +               bld.AND(tmp, tmp, fs_reg(1 << 6));
>> +               bld.XOR(dst, dst, tmp);
>
> I don't think this is correct for Y-tiled surfaces.  In Y-tiling, bit
> 6 of the address is given by bit6 XOR bit9.  You have one extra XOR
> which means you just get just bit9.
>
The extra XOR is made a no-op for the Y-tiled case in the same way that
they are both made a no-op in the no-swizzling case.  I'll add another
comment clarifying that.

>> +            }
>> +
>> +         } else {
>> +            /* Multiply by the Bpp/stride value. */
>
> I'm still confused as to how addr can have 2 things while dims == 1.
> Does dims specify the per-slice dimension or total dimension?  That
> should probably be documented at the top of the function.
>
dims specifies the dimension of the coord argument.  addr may still have
a non-zero y coordinate because it's the sum of coord and the
two-dimensional offset value specified by surface set-up.  1D texture
slices and levels are laid out in a 2D layout just like the rest, the
offset value specifies where in the 2D miptree the requested 1D image
starts.  You may wonder why we pass a 2D offset to the shader instead of
just fixing up the surface base memory address to point at the right
slice at offset (0, 0), the answer is that in the general case (i.e. not
necessarily 1D) it wouldn't work, because a level may start mid-tile and
the result of shifting a tiled surface by less than the tile size is in
general not a well-formed tiled surface -- In the 1D case it could
probably be done though because I deliberately disabled tiling for those
(36a17f0f9913), but it would require special-casing them in the surface
state set-up code and another manual calculation of the base memory
address of the right slice-level of the miptree.

>> +            bld.MUL(offset(addr, bld, 1),
>> +                    offset(addr, bld, 1), offset(stride, bld, 1));
>> +            bld.ADD(addr, addr, offset(addr, bld, 1));
>> +            bld.MUL(dst, addr, stride);
>> +         }
>> +
>> +         return dst;
>> +      }
>> +   }
>>  }
>> --
>> 2.4.3
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150724/7a841cfa/attachment-0001.sig>