[Mesa-dev] [PATCH 01/14] intel/compiler: general 8/16/32/64-bit shuffle_src_to_dst function

Thu Jun 14 00:07:55 UTC 2018

On 13/06/18 22:46, Jason Ekstrand wrote:
> On Sat, Jun 9, 2018 at 4:13 AM, Jose Maria Casanova Crespo
> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
> 
>     This new function takes care of shuffle/unshuffle components of a
>     particular bit-size in components with a different bit-size.
> 
>     If source type size is smaller than destination type size the operation
>     needed is a component shuffle. The opposite case would be an unshuffle.
> 
>     The operation allows to skip first_component number of components from
>     the source.
> 
>     Shuffle MOVs are retyped using integer types avoiding problems with
>     denorms
>     and float types. This allows to simplify uses of shuffle functions
>     that are
>     dealing with these retypes individually.
> 
>     Now there is a new restriction so source and destination can not overlap
>     anymore when calling this suffle function. Following patches that
>     migrate
>     to use this new function will take care individually of avoiding source
>     and destination overlaps.
>     ---
>      src/intel/compiler/brw_fs_nir.cpp | 92 +++++++++++++++++++++++++++++++
>      1 file changed, 92 insertions(+)
> 
>     diff --git a/src/intel/compiler/brw_fs_nir.cpp
>     b/src/intel/compiler/brw_fs_nir.cpp
>     index 166da0aa6d7..1a9d3c41d1d 100644
>     --- a/src/intel/compiler/brw_fs_nir.cpp
>     +++ b/src/intel/compiler/brw_fs_nir.cpp
>     @@ -5362,6 +5362,98 @@ shuffle_16bit_data_for_32bit_write(const
>     fs_builder &bld,
>         }
>      }
> 
>     +/*
>     + * This helper takes a source register and un/shuffles it into the
>     destination
>     + * register.
>     + *
>     + * If source type size is smaller than destination type size the
>     operation
>     + * needed is a component shuffle. The opposite case would be an
>     unshuffle. If
>     + * source/destination type size is equal a shuffle is done that
>     would be
>     + * equivalent to a simple MOV.
> 
> 
> There's a sticky bit here if we want this to work with 64-bit types on
> gen7 and earlier because we only have DF there and not Q so the
> brw_reg_type_from_bit_size below doesn't work.  If we care about that
> case (and I'm not convinced we do), it should be easy enough to add a
> type_sz(src.type) == type_sz(dst.type) case which just does MOVs from
> source to dest.

At this moment, current uses of this function are to read from 32-bits
or to write to 32-bit. But I think that for completeness if would be
nice to have all cases covered. The option of doing the MOVs in the case
of equality (that would be quite normal) saves us to do the shuffle
calculus for the simple case. So I'm going for it.

>     + *
>     + * For example, if source is a 16-bit type and destination is
>     32-bit. A 3
>     + * components .xyz 16-bit vector on SIMD8 would be.
>     + *
>     + *    |x1|x2|x3|x4|x5|x6|x7|x8|y1|y2|y3|y4|y5|y6|y7|y8|
>     + *    |z1|z2|z3|z4|z5|z6|z7|z8|  |  |  |  |  |  |  |  |
>     + *
>     + * This helper will return the following 2 32-bit components with
>     the 16-bit
>     + * values shuffled:
>     + *
>     + *    |x1 y1|x2 y2|x3 y3|x4 y4|x5 y5|x6 y6|x7 y7|x8 y8|
>     + *    |z1   |z2   |z3   |z4   |z5   |z6   |z7   |z8   |
>     + *
>     + * For unshuffle, the example would be the opposite, a 64-bit type
>     source
>     + * and a 32-bit destination. A 2 component .xy 64-bit vector on SIMD8
>     + * would be:
>     + *
>     + *    | x1l   x1h | x2l   x2h | x3l   x3h | x4l   x4h |
>     + *    | x5l   x5h | x6l   x6h | x7l   x7h | x8l   x8h |
>     + *    | y1l   y1h | y2l   y2h | y3l   y3h | y4l   y4h |
>     + *    | y5l   y5h | y6l   y6h | y7l   y7h | y8l   y8h |
>     + *
>     + * The returned result would be the following 4 32-bit components
>     unshuffled:
>     + *
>     + *    | x1l | x2l | x3l | x4l | x5l | x6l | x7l | x8l |
>     + *    | x1h | x2h | x3h | x4h | x5h | x6h | x7h | x8h |
>     + *    | y1l | y2l | y3l | y4l | y5l | y6l | y7l | y8l |
>     + *    | y1h | y2h | y3h | y4h | y5h | y6h | y7h | y8h |
>     + *
>     + * - Source and destination register must not be overlapped.
>     + * - first_component parameter allows skipping source components.
>     + */
>     +void
>     +shuffle_src_to_dst(const fs_builder &bld,
>     +                   const fs_reg &dst,
>     +                   const fs_reg &src,
>     +                   uint32_t first_component,
>     +                   uint32_t components)
>     +{
>     +   if (type_sz(src.type) <= type_sz(dst.type)) {
>     +      /* Source is shuffled into destination */
>     +      unsigned size_ratio = type_sz(dst.type) / type_sz(src.type);
>     +#ifndef NDEBUG
>     +      boolean src_dst_overlap = regions_overlap(dst,
>     +         type_sz(dst.type) * bld.dispatch_width() * components,
>     +         offset(src, bld, first_component * size_ratio),
> 
> 
> Why do you need to multiply first_component by size_ratio?  It's already
> in units of source components.

Yes, that's wrong. I forgot to update this assertion with my lasts
refactorings of this series, and as nowadays there are no overlaps I
didn't realize of the error. Both calls to regions_overlap have all
parameters wrong.

I should also include in the comment than the "component" parameter is
measured in terms of the smaller type between source and destination. As
we are un/shuffling the smaller components from/into a bigger one.

>     +         type_sz(src.type) * bld.dispatch_width() * components *
>     size_ratio);
>     +#endif
>     +      assert(!src_dst_overlap);
> 
> 
> If the only thing you're doing with src_dst_overlap is to assert on it,
> you may as well put the regions_overlap call inside the assert and drop
> the #ifndef.

Ok, I just thought it was too long function. I'll change it.

>     +
>     +      brw_reg_type shuffle_type =
>     +         brw_reg_type_from_bit_size(8 * type_sz(src.type),
>     +                                    BRW_REGISTER_TYPE_D);
>     +      for (unsigned i = 0; i < components; i++) {
>     +         fs_reg shuffle_component_i =
>     +            subscript(offset(dst, bld, i / size_ratio),
>     +                      shuffle_type, i % size_ratio);
>     +         bld.MOV(shuffle_component_i,
>     +                 retype(offset(src, bld, i + first_component),
>     shuffle_type));
>     +      }
>     +   } else {
>     +      /* Source is unshuffled into destination */
>     +      unsigned size_ratio = type_sz(src.type) / type_sz(dst.type);
>     +#ifndef NDEBUG
>     +      boolean src_dst_overlap = regions_overlap(dst,
>     +         type_sz(dst.type) * bld.dispatch_width() * components,
>     +         offset(src, bld, first_component / size_ratio),
> 
> 
> Again, I don't think we want to divide here.

Sure.

>  
> 
>     +         type_sz(src.type) * bld.dispatch_width() *
>     +         DIV_ROUND_UP(components + first_component % size_ratio,
>     size_ratio));
>     +#endif
>     +      assert(!src_dst_overlap);
> 
> 
> As I said above, everything can go in the assert.

Ok.

> Other than that, this looks great.

Thanks for the review. I'm sending tomorrow a v2 with the fixes.

> --Jason
> 
>  
> 
>     +      brw_reg_type shuffle_type =
>     +         brw_reg_type_from_bit_size(8 * type_sz(dst.type),
>     +                                    BRW_REGISTER_TYPE_D);
>     +      for (unsigned i = 0; i < components; i++) {
>     +         fs_reg shuffle_component_i =
>     +            subscript(offset(src, bld, (first_component + i) /
>     size_ratio),
>     +                      shuffle_type, (first_component + i) %
>     size_ratio);
>     +         bld.MOV(retype(offset(dst, bld, i), shuffle_type),
>     +                 shuffle_component_i);
>     +      }
>     +   }
>     +}
>     +
>      fs_reg
>      setup_imm_df(const fs_builder &bld, double v)
>      {
>     -- 
>     2.17.1
> 
>     _______________________________________________
>     mesa-dev mailing list
>     mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
>     https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
> 
> 
> 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>