[Mesa-dev] [PATCH 07/20] i965/fs: Import image memory offset calculation code.

Francisco Jerez currojerez at riseup.net
Thu Jul 23 06:57:53 PDT 2015


Jason Ekstrand <jason at jlekstrand.net> writes:

> This needs a *lot* more commentary.  These calculations are extremely
> tricky and there are almost no comments.  For instance, you are
> turning a 2D offset on a tiled surface into a new 2D address into the
> raw view of the surface.  Nowhere do you explain what the "raw"
> surface looks like and how its width/height map to "real" tiled
> version.

Hmm, I'm not sure what you mean by raw surface, the raw address is just
the real memory offset of the texel in memory as specified in the
hardware documentation (see "BSpec » Memory Views » Address Tiling
Function Introduction » Tile Formats" for an extensive description of
the X/Y tiling layouts).

> Somewhere, you need to write all this down in detail and explain why
> your calculations are correct.  If you don't write it down here, then
> write it down somewhere else and put a comment here explaining where
> to go look.
>
Sure, I'll go add some more references and comments, but I doubt it
makes sense to duplicate the specification of the hardware tiling format
in full detail here.

> I tried to dig through it, but I'm having a lot of trouble getting any
> further splitting the x/y into major/minor.
>
> Cc'ing chad.  He knows miptree layouts far better than I do.  Maybe
> this is all obvious to someone familiar with the calculations.
>
> --Jason
>
> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>> Define a function to calculate the memory address of the image
>> location given by a vector of coordinates.  This is required in cases
>> where we need to fall back to untyped surface access, which take a raw
>> memory offset and know nothing about surface coordinates, type
>> conversion or memory tiling and swizzling.  They are still useful
>> because typed surface reads don't support any 64 or 128-bit formats on
>> IVB, and they don't support any 128-bit formats on HSW and BDW.
>>
>> The tiling algorithm is implemented based on a number of parameters
>> which are passed in as uniforms and determine whether the surface
>> layout is X-tiled, Y-tiled or untiled.  This allows binding surfaces
>> of different tiling layouts to the pipeline without recompiling the
>> program.
>>
>> v2: Drop VEC4 suport.
>> v3: Rebase.
>> ---
>>  .../drivers/dri/i965/brw_fs_surface_builder.cpp    | 108 +++++++++++++++++++++
>>  1 file changed, 108 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> index 5ee04de..0c879db 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> @@ -215,4 +215,112 @@ namespace {
>>           return BRW_PREDICATE_NORMAL;
>>        }
>>     }
>> +
>> +   namespace image_coordinates {
>> +      /**
>> +       * Calculate the offset in memory of the texel given by \p coord.
>> +       *
>> +       * This is meant to be used with untyped surface messages to access a
>> +       * tiled surface, what involves taking into account the tiling and
>> +       * swizzling modes of the surface manually so it will hopefully not
>> +       * happen very often.
>> +       */
>> +      fs_reg
>> +      emit_address_calculation(const fs_builder &bld, const fs_reg &image,
>> +                               const fs_reg &coord, unsigned dims)
>> +      {
>> +         const brw_device_info *devinfo = bld.shader->devinfo;
>> +         const fs_reg off = offset(image, bld, BRW_IMAGE_PARAM_OFFSET_OFFSET);
>> +         const fs_reg stride = offset(image, bld, BRW_IMAGE_PARAM_STRIDE_OFFSET);
>> +         const fs_reg tile = offset(image, bld, BRW_IMAGE_PARAM_TILING_OFFSET);
>> +         const fs_reg swz = offset(image, bld, BRW_IMAGE_PARAM_SWIZZLING_OFFSET);
>> +         const fs_reg addr = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
>> +         const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
>> +         const fs_reg minor = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
>> +         const fs_reg major = bld.vgrf(BRW_REGISTER_TYPE_UD, 2);
>> +         const fs_reg dst = bld.vgrf(BRW_REGISTER_TYPE_UD);
>> +
>> +         /* Shift the coordinates by the fixed surface offset. */
>> +         for (unsigned c = 0; c < 2; ++c)
>> +            bld.ADD(offset(addr, bld, c), offset(off, bld, c),
>> +                    (c < dims ?
>> +                     offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, c) :
>> +                     fs_reg(0)));
>> +
>> +         if (dims > 2) {
>> +            /* Decompose z into a major (tmp.y) and a minor (tmp.x)
>> +             * index.
>> +             */
>> +            bld.BFE(offset(tmp, bld, 0), offset(tile, bld, 2), fs_reg(0),
>> +                    offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, 2));
>> +            bld.SHR(offset(tmp, bld, 1),
>> +                    offset(retype(coord, BRW_REGISTER_TYPE_UD), bld, 2),
>> +                    offset(tile, bld, 2));
>> +
>> +            /* Take into account the horizontal (tmp.x) and vertical (tmp.y)
>> +             * slice offset.
>> +             */
>> +            for (unsigned c = 0; c < 2; ++c) {
>> +               bld.MUL(offset(tmp, bld, c),
>> +                       offset(stride, bld, 2 + c), offset(tmp, bld, c));
>> +               bld.ADD(offset(addr, bld, c),
>> +                       offset(addr, bld, c), offset(tmp, bld, c));
>> +            }
>
> Tiled surfaces are inherently 2-D, but it seems like you're treating
> it as if it has 3-D tiles.  What's going on here?

The underlying tiled memory is 2D, but the image coordinates we get as
input may be 3D -- This code is lowering 3D coordinates into the
matching 2D coordinates of the underlying tiled surface.  The layout of
3D textures in memory (which indeed looks a lot like a sort of 3D-tiling
with miptree level-dependent modulus) is explained in detail in "BSpec »
Memory Views » Common Surface Formats » Surface Layout [Pre-SKL] » 3D
Surfaces [Pre-SKL]" (note that this algorithm is only used on pre-SKL
hardware).  I'll add a reference here to that section of the BSpec -- Or
do you want me to add an excerpt of the relevant text from the BSpec?
I'll see if I can find a similar explanation on the public PRMs to avoid
IP issues.

>
>> +         }
>> +
>> +         if (dims > 1) {
>> +            for (unsigned c = 0; c < 2; ++c) {
>> +               /* Calculate the minor x and y indices. */
>> +               bld.BFE(offset(minor, bld, c), offset(tile, bld, c),
>> +                       fs_reg(0), offset(addr, bld, c));
>> +
>> +               /* Calculate the major x and y indices. */
>> +               bld.SHR(offset(major, bld, c),
>> +                       offset(addr, bld, c), offset(tile, bld, c));
>> +            }
>> +
>> +            /* Calculate the texel index from the start of the tile row and
>> +             * the vertical coordinate of the row.
>> +             * Equivalent to:
>> +             *   tmp.x = (major.x << tile.y << tile.x) +
>> +             *           (minor.y << tile.x) + minor.x
>> +             *   tmp.y = major.y << tile.y
>> +             */
>> +            bld.SHL(tmp, major, offset(tile, bld, 1));
>> +            bld.ADD(tmp, tmp, offset(minor, bld, 1));
>> +            bld.SHL(tmp, tmp, offset(tile, bld, 0));
>> +            bld.ADD(tmp, tmp, minor);
>> +            bld.SHL(offset(tmp, bld, 1),
>> +                    offset(major, bld, 1), offset(tile, bld, 1));
>> +
>> +            /* Add it to the start of the tile row. */
>> +            bld.MUL(offset(tmp, bld, 1),
>> +                    offset(tmp, bld, 1), offset(stride, bld, 1));
>> +            bld.ADD(tmp, tmp, offset(tmp, bld, 1));
>> +
>> +            /* Multiply by the Bpp value. */
>> +            bld.MUL(dst, tmp, stride);
>
> Where in these calculations do you take into account the different
> ordering for X-tiled vs. Y-tiled (x-major vs. y-major)?
>
I don't, because (at least when it comes to their layout in memory, the
hardware logic may of course be implemented differently) they're
basically the same thing with different coefficients.  For the
calculation of tiling moduli based on the texture being X or Y tiled see
update_texture_image_param() in "i965: Define and initialize image
parameter structure.".

>> +
>> +            if (devinfo->gen < 8 && !devinfo->is_baytrail) {
>
> Should this be has_swizzling?  Or is that a composite thing in
> brw_context that varies based on runtime checks?
>
Yes, this could be using has_swizzling to avoid emitting dead code on
gen7 hardware without swizzling enabled -- in fact it was, IIRC until my
v3 rebase.  I dropped the check because the has_swizzling flag is part
of the context instead of being in brw_device_info (we should probably
fix that sometime), and I noticed you had been cleaning up other uses of
brw_context from the compiler, so I didn't want to make the matter worse
(right now with no access to brw_context at all it would be impossible
to access the flag here AFAICT).

Note that this code will still work in the no-swizzling case because the
bit shifts selected by state setup and passed in as uniforms will cause
it to be the identity -- but yeah the generated assembly will definitely
be more instructions than it needs to.  I'll add a comment here to
clarify how the no-swizzling case is handled.

>> +               /* Take into account the two dynamically specified shifts. */
>> +               for (unsigned c = 0; c < 2; ++c)
>> +                  bld.SHR(offset(tmp, bld, c), dst, offset(swz, bld, c));
>> +
>> +               /* XOR tmp.x and tmp.y with bit 6 of the memory address. */
>> +               bld.XOR(tmp, tmp, offset(tmp, bld, 1));
>> +               bld.AND(tmp, tmp, fs_reg(1 << 6));
>> +               bld.XOR(dst, dst, tmp);
>> +            }
>> +
>> +         } else {
>
> Why do we need to do anything for 1-D surfaces?
>

Of course the Bpp needs to be taken into account, and addr.y may still
be non-zero if the 1D surface was taken from a slice of a
higher-dimensional surface.  I'll clarify that in a comment for future
reference.

>> +            /* Multiply by the Bpp/stride value. */
>> +            bld.MUL(offset(addr, bld, 1),
>> +                    offset(addr, bld, 1), offset(stride, bld, 1));
>> +            bld.ADD(addr, addr, offset(addr, bld, 1));
>> +            bld.MUL(dst, addr, stride);
>> +         }
>> +
>> +         return dst;
>> +      }
>> +   }
>>  }
>> --
>> 2.4.3
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150723/ec7aee44/attachment-0001.sig>


More information about the mesa-dev mailing list