[Mesa-dev] [PATCH 01/14] intel/compiler: general 8/16/32/64-bit shuffle_src_to_dst function
Jason Ekstrand
jason at jlekstrand.net
Thu Jun 14 00:46:36 UTC 2018
On Wed, Jun 13, 2018 at 5:07 PM, Chema Casanova <jmcasanova at igalia.com>
wrote:
> On 13/06/18 22:46, Jason Ekstrand wrote:
> > On Sat, Jun 9, 2018 at 4:13 AM, Jose Maria Casanova Crespo
> > <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
> >
> > This new function takes care of shuffle/unshuffle components of a
> > particular bit-size in components with a different bit-size.
> >
> > If source type size is smaller than destination type size the
> operation
> > needed is a component shuffle. The opposite case would be an
> unshuffle.
> >
> > The operation allows to skip first_component number of components
> from
> > the source.
> >
> > Shuffle MOVs are retyped using integer types avoiding problems with
> > denorms
> > and float types. This allows to simplify uses of shuffle functions
> > that are
> > dealing with these retypes individually.
> >
> > Now there is a new restriction so source and destination can not
> overlap
> > anymore when calling this suffle function. Following patches that
> > migrate
> > to use this new function will take care individually of avoiding
> source
> > and destination overlaps.
> > ---
> > src/intel/compiler/brw_fs_nir.cpp | 92
> +++++++++++++++++++++++++++++++
> > 1 file changed, 92 insertions(+)
> >
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > index 166da0aa6d7..1a9d3c41d1d 100644
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > @@ -5362,6 +5362,98 @@ shuffle_16bit_data_for_32bit_write(const
> > fs_builder &bld,
> > }
> > }
> >
> > +/*
> > + * This helper takes a source register and un/shuffles it into the
> > destination
> > + * register.
> > + *
> > + * If source type size is smaller than destination type size the
> > operation
> > + * needed is a component shuffle. The opposite case would be an
> > unshuffle. If
> > + * source/destination type size is equal a shuffle is done that
> > would be
> > + * equivalent to a simple MOV.
> >
> >
> > There's a sticky bit here if we want this to work with 64-bit types on
> > gen7 and earlier because we only have DF there and not Q so the
> > brw_reg_type_from_bit_size below doesn't work. If we care about that
> > case (and I'm not convinced we do), it should be easy enough to add a
> > type_sz(src.type) == type_sz(dst.type) case which just does MOVs from
> > source to dest.
>
> At this moment, current uses of this function are to read from 32-bits
> or to write to 32-bit. But I think that for completeness if would be
> nice to have all cases covered. The option of doing the MOVs in the case
> of equality (that would be quite normal) saves us to do the shuffle
> calculus for the simple case. So I'm going for it.
>
> > + *
> > + * For example, if source is a 16-bit type and destination is
> > 32-bit. A 3
> > + * components .xyz 16-bit vector on SIMD8 would be.
> > + *
> > + * |x1|x2|x3|x4|x5|x6|x7|x8|y1|y2|y3|y4|y5|y6|y7|y8|
> > + * |z1|z2|z3|z4|z5|z6|z7|z8| | | | | | | | |
> > + *
> > + * This helper will return the following 2 32-bit components with
> > the 16-bit
> > + * values shuffled:
> > + *
> > + * |x1 y1|x2 y2|x3 y3|x4 y4|x5 y5|x6 y6|x7 y7|x8 y8|
> > + * |z1 |z2 |z3 |z4 |z5 |z6 |z7 |z8 |
> > + *
> > + * For unshuffle, the example would be the opposite, a 64-bit type
> > source
> > + * and a 32-bit destination. A 2 component .xy 64-bit vector on
> SIMD8
> > + * would be:
> > + *
> > + * | x1l x1h | x2l x2h | x3l x3h | x4l x4h |
> > + * | x5l x5h | x6l x6h | x7l x7h | x8l x8h |
> > + * | y1l y1h | y2l y2h | y3l y3h | y4l y4h |
> > + * | y5l y5h | y6l y6h | y7l y7h | y8l y8h |
> > + *
> > + * The returned result would be the following 4 32-bit components
> > unshuffled:
> > + *
> > + * | x1l | x2l | x3l | x4l | x5l | x6l | x7l | x8l |
> > + * | x1h | x2h | x3h | x4h | x5h | x6h | x7h | x8h |
> > + * | y1l | y2l | y3l | y4l | y5l | y6l | y7l | y8l |
> > + * | y1h | y2h | y3h | y4h | y5h | y6h | y7h | y8h |
> > + *
> > + * - Source and destination register must not be overlapped.
> > + * - first_component parameter allows skipping source components.
> > + */
> > +void
> > +shuffle_src_to_dst(const fs_builder &bld,
> > + const fs_reg &dst,
> > + const fs_reg &src,
> > + uint32_t first_component,
> > + uint32_t components)
> > +{
> > + if (type_sz(src.type) <= type_sz(dst.type)) {
> > + /* Source is shuffled into destination */
> > + unsigned size_ratio = type_sz(dst.type) / type_sz(src.type);
> > +#ifndef NDEBUG
> > + boolean src_dst_overlap = regions_overlap(dst,
> > + type_sz(dst.type) * bld.dispatch_width() * components,
> > + offset(src, bld, first_component * size_ratio),
> >
> >
> > Why do you need to multiply first_component by size_ratio? It's already
> > in units of source components.
>
> Yes, that's wrong. I forgot to update this assertion with my lasts
> refactorings of this series, and as nowadays there are no overlaps I
> didn't realize of the error. Both calls to regions_overlap have all
> parameters wrong.
>
> I should also include in the comment than the "component" parameter is
> measured in terms of the smaller type between source and destination. As
> we are un/shuffling the smaller components from/into a bigger one.
>
Yes, please. Is that really what we want though? Do we have cases where
we are going from a big type to a small type and want a small component
offset? I suppose I could see it happening. It's ok if it is what we
want; I just want to make sure. :-)
> > + type_sz(src.type) * bld.dispatch_width() * components *
> > size_ratio);
> > +#endif
> > + assert(!src_dst_overlap);
> >
> >
> > If the only thing you're doing with src_dst_overlap is to assert on it,
> > you may as well put the regions_overlap call inside the assert and drop
> > the #ifndef.
>
> Ok, I just thought it was too long function. I'll change it.
>
> > +
> > + brw_reg_type shuffle_type =
> > + brw_reg_type_from_bit_size(8 * type_sz(src.type),
> > + BRW_REGISTER_TYPE_D);
> > + for (unsigned i = 0; i < components; i++) {
> > + fs_reg shuffle_component_i =
> > + subscript(offset(dst, bld, i / size_ratio),
> > + shuffle_type, i % size_ratio);
> > + bld.MOV(shuffle_component_i,
> > + retype(offset(src, bld, i + first_component),
> > shuffle_type));
> > + }
> > + } else {
> > + /* Source is unshuffled into destination */
> > + unsigned size_ratio = type_sz(src.type) / type_sz(dst.type);
> > +#ifndef NDEBUG
> > + boolean src_dst_overlap = regions_overlap(dst,
> > + type_sz(dst.type) * bld.dispatch_width() * components,
> > + offset(src, bld, first_component / size_ratio),
> >
> >
> > Again, I don't think we want to divide here.
>
> Sure.
>
> >
> >
> > + type_sz(src.type) * bld.dispatch_width() *
> > + DIV_ROUND_UP(components + first_component % size_ratio,
> > size_ratio));
> > +#endif
> > + assert(!src_dst_overlap);
> >
> >
> > As I said above, everything can go in the assert.
>
> Ok.
>
> > Other than that, this looks great.
>
> Thanks for the review. I'm sending tomorrow a v2 with the fixes.
>
> > --Jason
> >
> >
> >
> > + brw_reg_type shuffle_type =
> > + brw_reg_type_from_bit_size(8 * type_sz(dst.type),
> > + BRW_REGISTER_TYPE_D);
> > + for (unsigned i = 0; i < components; i++) {
> > + fs_reg shuffle_component_i =
> > + subscript(offset(src, bld, (first_component + i) /
> > size_ratio),
> > + shuffle_type, (first_component + i) %
> > size_ratio);
> > + bld.MOV(retype(offset(dst, bld, i), shuffle_type),
> > + shuffle_component_i);
> > + }
> > + }
> > +}
> > +
> > fs_reg
> > setup_imm_df(const fs_builder &bld, double v)
> > {
> > --
> > 2.17.1
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.
> freedesktop.org>
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
> >
> >
> >
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180613/5bb894ed/attachment-0001.html>
More information about the mesa-dev
mailing list