[Mesa-dev] [PATCH 14/22] intel/compiler: Do image load/store lowering to NIR

Wed Aug 29 16:19:26 UTC 2018

On Wednesday, August 29, 2018 5:31:38 AM PDT Jason Ekstrand wrote:
> On Wed, Aug 29, 2018 at 12:49 AM Kenneth Graunke <kenneth at whitecape.org>
> wrote:
> 
> > On Friday, August 17, 2018 1:06:20 PM PDT Jason Ekstrand wrote:
> > [snip]
> > > +# Intel-specific query for loading from the brw_image_param struct
> > passed
> > > +# into the shader as a uniform.  The variable is a deref to the image
> > > +# variable. The const index specifies which of the six parameters to
> > load.
> >
> > This might be our first driver-specific intrinsics.  Some people make
> > big extensibility systems for that, where drivers can extend with their
> > own concepts.  But given that we all live in the same project, I think
> > this makes a lot of sense - just have everybody add their own here,
> > suffixed with a name they own (i.e. intel, amd, radv, ir3, whatever).
> >
> 
> We almost added an ir3 intrinsic some time ago.  I don't remember what it
> was or why it didn't happen.
> 
> 
> > It's certainly nice and simple.
> >
> > > +intrinsic("image_deref_load_param_intel", src_comp=[1], dest_comp=0,
> > > +          indices=[BASE], flags=[CAN_ELIMINATE, CAN_REORDER])
> > > +intrinsic("image_deref_load_raw_intel", src_comp=[1, 1], dest_comp=0,
> > > +          flags=[CAN_ELIMINATE, CAN_REORDER])
> > > +intrinsic("image_deref_store_raw_intel", src_comp=[1, 1, 0])
> >
> > I don't think you want CAN_REORDER for the new load intrinsic...at
> > least, image_deref_load only has CAN_ELIMINATE.  It probably makes
> > sense for the two to match...
> >
> 
> Right... Thanks!
> 
> 
> > [snip]
> > > +static nir_ssa_def *
> > > +image_address(nir_builder *b, const struct gen_device_info *devinfo,
> > > +              nir_deref_instr *deref, nir_ssa_def *coord)
> > > +{
> > > +   coord = sanitize_image_coord(b, deref, coord);
> > > +
> > > +   nir_ssa_def *offset = load_image_param(b, deref, OFFSET);
> > > +   nir_ssa_def *tiling = load_image_param(b, deref, TILING);
> > > +   nir_ssa_def *stride = load_image_param(b, deref, STRIDE);
> > > +
> > > +   /* Shift the coordinates by the fixed surface offset.  It may be
> > non-zero
> > > +    * if the image is a single slice of a higher-dimensional surface,
> > or if a
> > > +    * non-zero mipmap level of the surface is bound to the pipeline.
> > The
> > > +    * offset needs to be applied here rather than at surface state
> > set-up time
> > > +    * because the desired slice-level may start mid-tile, so simply
> > shifting
> > > +    * the surface base address wouldn't give a well-formed tiled
> > surface in
> > > +    * the general case.
> > > +    */
> > > +   nir_ssa_def *xypos = (coord->num_components == 1) ?
> > > +                        nir_vec2(b, coord, nir_imm_int(b, 0)) :
> > > +                        nir_channels(b, coord, 0x3);
> > > +   xypos = nir_iadd(b, xypos, offset);
> > > +
> > > +   /* The layout of 3-D textures in memory is sort-of like a tiling
> > > +    * format.  At each miplevel, the slices are arranged in rows of
> > > +    * 2^level slices per row.  The slice row is stored in tmp.y and
> > > +    * the slice within the row is stored in tmp.x.
> > > +    *
> > > +    * The layout of 2-D array textures and cubemaps is much simpler:
> > > +    * Depending on whether the ARYSPC_LOD0 layout is in use it will be
> > > +    * stored in memory as an array of slices, each one being a 2-D
> > > +    * arrangement of miplevels, or as a 2D arrangement of miplevels,
> > > +    * each one being an array of slices.  In either case the separation
> > > +    * between slices of the same LOD is equal to the qpitch value
> > > +    * provided as stride.w.
> > > +    *
> > > +    * This code can be made to handle either 2D arrays and 3D textures
> > > +    * by passing in the miplevel as tile.z for 3-D textures and 0 in
> > > +    * tile.z for 2-D array textures.
> > > +    *
> > > +    * See Volume 1 Part 1 of the Gen7 PRM, sections 6.18.4.7 "Surface
> > > +    * Arrays" and 6.18.6 "3D Surfaces" for a more extensive discussion
> > > +    * of the hardware 3D texture and 2D array layouts.
> > > +    */
> > > +   if (coord->num_components > 2) {
> > > +      /* Decompose z into a major (tmp.y) and a minor (tmp.x)
> > > +       * index.
> > > +       */
> > > +      nir_ssa_def *z = nir_channel(b, coord, 2);
> > > +      nir_ssa_def *z_x = nir_ubfe(b, z, nir_imm_int(b, 0),
> > > +                                  nir_channel(b, tiling, 2));
> > > +      nir_ssa_def *z_y = nir_ushr(b, z, nir_channel(b, tiling, 2));
> > > +
> > > +      /* Take into account the horizontal (tmp.x) and vertical (tmp.y)
> > > +       * slice offset.
> > > +       */
> > > +      xypos = nir_iadd(b, xypos, nir_imul(b, nir_vec2(b, z_x, z_y),
> > > +                                             nir_channels(b, stride,
> > 0xc)));
> > > +   }
> > > +
> > > +   nir_ssa_def *addr;
> > > +   if (coord->num_components > 1) {
> > > +      /* Calculate the major/minor x and y indices.  In order to
> > > +       * accommodate both X and Y tiling, the Y-major tiling format is
> > > +       * treated as being a bunch of narrow X-tiles placed next to each
> > > +       * other.  This means that the tile width for Y-tiling is actually
> > > +       * the width of one sub-column of the Y-major tile where each 4K
> > > +       * tile has 8 512B sub-columns.
> > > +       *
> > > +       * The major Y value is the row of tiles in which the pixel lives.
> > > +       * The major X value is the tile sub-column in which the pixel
> > > +       * lives; for X tiling, this is the same as the tile column, for Y
> > > +       * tiling, each tile has 8 sub-columns.  The minor X and Y indices
> > > +       * are the position within the sub-column.
> > > +       */
> > > +
> > > +      /* Calculate the minor x and y indices. */
> > > +      nir_ssa_def *minor = nir_ubfe(b, xypos, nir_imm_int(b, 0),
> > > +                                       nir_channels(b, tiling, 0x3));
> > > +      nir_ssa_def *major = nir_ushr(b, xypos, nir_channels(b, tiling,
> > 0x3));
> > > +
> > > +      /* Calculate the texel index from the start of the tile row and
> > the
> > > +       * vertical coordinate of the row.
> > > +       * Equivalent to:
> > > +       *   tmp.x = (major.x << tile.y << tile.x) +
> > > +       *           (minor.y << tile.x) + minor.x
> > > +       *   tmp.y = major.y << tile.y
> > > +       */
> > > +      nir_ssa_def *idx_x, *idx_y;
> > > +      idx_x = nir_ishl(b, nir_channel(b, major, 0), nir_channel(b,
> > tiling, 1));
> > > +      idx_x = nir_iadd(b, idx_x, nir_channel(b, minor, 1));
> > > +      idx_x = nir_ishl(b, idx_x, nir_channel(b, tiling, 0));
> > > +      idx_x = nir_iadd(b, idx_x, nir_channel(b, minor, 0));
> > > +      idx_y = nir_ishl(b, nir_channel(b, major, 1), nir_channel(b,
> > tiling, 1));
> > > +
> > > +      /* Add it to the start of the tile row. */
> > > +      nir_ssa_def *idx;
> > > +      idx = nir_imul(b, idx_y, nir_channel(b, stride, 1));
> > > +      idx = nir_iadd(b, idx, idx_x);
> > > +
> >
> > Maybe preserve the /* Multiply by Bpp value */ comment here?
> >
> 
> Done
> 
> 
> > > +      addr = nir_imul(b, idx, nir_channel(b, stride, 0));
> > > +
> > > +      if (devinfo->gen < 8 && !devinfo->is_baytrail) {
> > > +         /* Take into account the two dynamically specified shifts.
> > Both need
> > > +          * are used to implement swizzling of X-tiled surfaces.  For
> > Y-tiled
> >
> > May as well fix the "Both need are used" wording here -> "Both are used"
> >
> 
> Done
> 
> 
> > [snip]
> > > +static bool
> > > +lower_image_load_instr(nir_builder *b,
> > > +                       const struct gen_device_info *devinfo,
> > > +                       nir_intrinsic_instr *intrin)
> > > +{
> > > +   nir_deref_instr *deref = nir_src_as_deref(intrin->src[0]);
> > > +   nir_variable *var = nir_deref_instr_get_variable(deref);
> > > +   const enum isl_format image_fmt =
> > > +      isl_format_for_gl_format(var->data.image.format);
> > > +
> > > +   if (isl_has_matching_typed_storage_image_format(devinfo, image_fmt))
> > {
> > > +      const enum isl_format lower_fmt =
> > > +         isl_lower_storage_image_format(devinfo, image_fmt);
> > > +      const unsigned dest_components = intrin->num_components;
> > > +
> > > +      /* Use an undef to hold the uses of the load while we do the color
> > > +       * conversion.
> > > +       */
> > > +      nir_ssa_def *placeholder = nir_ssa_undef(b, 4, 32);
> > > +      nir_ssa_def_rewrite_uses(&intrin->dest.ssa,
> > nir_src_for_ssa(placeholder));
> > > +
> > > +      intrin->num_components = isl_format_get_num_channels(lower_fmt);
> > > +      intrin->dest.ssa.num_components = intrin->num_components;
> > > +
> > > +      nir_ssa_def *value;
> > > +      if (devinfo->gen == 7 && !devinfo->is_haswell) {
> > > +         /* Check the first component of the size field to find out if
> > the
> > > +          * image is bound.  Necessary on IVB because it don't seem to
> > respect
> > > +          * null surfaces and will hang when no image is bound.
> > > +          */
> > > +         b->cursor = nir_instr_remove(&intrin->instr);
> > > +         nir_ssa_def *size = load_image_param(b, deref, SIZE);
> > > +         nir_push_if(b, nir_ine(b, nir_channel(b, size, 0),
> > nir_imm_int(b, 0)));
> > > +         nir_builder_instr_insert(b, &intrin->instr);
> > > +         nir_push_else(b, NULL);
> > > +         nir_ssa_def *zero = nir_zero_vec(b, intrin->num_components);
> > > +         nir_pop_if(b, NULL);
> > > +         value = nir_if_phi(b, &intrin->dest.ssa, zero);
> > > +      } else {
> > > +         b->cursor = nir_after_instr(&intrin->instr);
> > > +         value = &intrin->dest.ssa;
> > > +      }
> >
> > I don't think this Gen7 code is necessary...at least, I don't see the
> > old code checking this for typed loads.  (Atomics, yes...loads, no...)
> >
> 
> You're right.  It was an extra check on untyped loads and stores which
> checked that bpp > 4.  Typed was left alone.  I've reworked things and have
> kicked it off to Jenkins.  I'll send a v2 when it comes back.
> 
> --Jason

Right...I don't see that untyped check happening in the new code,
either.  Are you adding it back in v2?

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180829/9dfb24e3/attachment.sig>