[Mesa-dev] [PATCH v4 22/44] i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components
Jason Ekstrand
jason at jlekstrand.net
Fri Dec 1 10:27:00 UTC 2017
On Thu, Nov 30, 2017 at 3:41 PM, Chema Casanova <jmcasanova at igalia.com>
wrote:
>
>
> On 30/11/17 23:21, Jason Ekstrand wrote:
> > On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo
> > <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
> >
> > This helpers are used to load/store 16-bit types from/to 32-bit
> > components.
> >
> > The functions shuffle_32bit_load_result_to_16bit_data and
> > shuffle_16bit_data_for_32bit_write are implemented in a similar
> > way than the analogous functions for handling 64-bit types.
> > ---
> > src/intel/compiler/brw_fs.h | 11 +++++++++
> > src/intel/compiler/brw_fs_nir.cpp | 51
> > +++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 62 insertions(+)
> >
> > diff --git a/src/intel/compiler/brw_fs.h
> b/src/intel/compiler/brw_fs.h
> > index 19b897e7a9..30557324d5 100644
> > --- a/src/intel/compiler/brw_fs.h
> > +++ b/src/intel/compiler/brw_fs.h
> > @@ -497,6 +497,17 @@ void
> > shuffle_32bit_load_result_to_64bit_data(const brw::fs_builder &bld,
> > fs_reg shuffle_64bit_data_for_32bit_write(const brw::fs_builder
> &bld,
> > const fs_reg &src,
> > uint32_t components);
> > +
> > +void shuffle_32bit_load_result_to_16bit_data(const brw::fs_builder
> > &bld,
> > + const fs_reg &dst,
> > + const fs_reg &src,
> > + uint32_t components);
> > +
> > +void shuffle_16bit_data_for_32bit_write(const brw::fs_builder &bld,
> > + const fs_reg &dst,
> > + const fs_reg &src,
> > + uint32_t components);
> > +
> > fs_reg setup_imm_df(const brw::fs_builder &bld,
> > double v);
> >
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > index 726b2fcee7..c091241132 100644
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > @@ -4828,6 +4828,33 @@ shuffle_32bit_load_result_to_64bit_data(const
> > fs_builder &bld,
> > }
> > }
> >
> > +void
> > +shuffle_32bit_load_result_to_16bit_data(const fs_builder &bld,
> > + const fs_reg &dst,
> > + const fs_reg &src,
> > + uint32_t components)
> > +{
> > + assert(type_sz(src.type) == 4);
> > + assert(type_sz(dst.type) == 2);
> > +
> > + fs_reg tmp = retype(bld.vgrf(src.type), dst.type);
> > +
> > + for (unsigned i = 0; i < components; i++) {
> > + const fs_reg component_i = subscript(offset(src, bld, i / 2),
> > dst.type, i % 2);
> > +
> > + bld.MOV(offset(tmp, bld, i % 2), component_i);
> > +
> > + if (i % 2) {
> > + bld.MOV(offset(dst, bld, i -1), offset(tmp, bld, 0));
> > + bld.MOV(offset(dst, bld, i), offset(tmp, bld, 1));
> > + }
> >
> >
> > I'm very confused by this extra moving. Why can't we just do
> >
> > bld.MOV(offset(dst, bld, i), component_i);
> >
> > above? Maybe I'm missing something but I don't see why the extra moves
> > are needed.
>
>
> There is a comment in the previous function
> shuffle_32bit_load_result_to_64bit_data that explains the similar
> situation that still applies in shuffle_32bit_load_result_to_16bit_data,
> what about including the following comment.
>
> /* A temporary that we will use to shuffle the 16-bit data of each
> * component in the vector into valid 32-bit data. We can't write
> * directly to dst because dst can be the same as src and in that
> * case the first MOV in the loop below would overwrite the data
> * read in the second MOV.
> */
>
> But in any case I've just checked, and at first sight at the 6 final
> uses of this function this situation never happens.
>
> >
> >
> > + }
> > + if (components % 2) {
> > + bld.MOV(offset(dst, bld, components - 1), tmp);
> > + }
> > +}
> > +
> > +
> > /**
> > * This helper does the inverse operation of
> > * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
> > @@ -4860,6 +4887,30 @@ shuffle_64bit_data_for_32bit_write(const
> > fs_builder &bld,
> > return dst;
> > }
> >
> > +void
> > +shuffle_16bit_data_for_32bit_write(const fs_builder &bld,
> > + const fs_reg &dst,
> > + const fs_reg &src,
> > + uint32_t components)
> > +{
> > + assert(type_sz(src.type) == 2);
> > + assert(type_sz(dst.type) == 4);
> > +
> > + fs_reg tmp = bld.vgrf(dst.type);
> > +
> > + for (unsigned i = 0; i < components; i++) {
> > + const fs_reg component_i = offset(src, bld, i);
> > + bld.MOV(subscript(tmp, src.type, i % 2), component_i);
> > + if (i % 2) {
> > + bld.MOV(offset(dst, bld, i / 2), tmp);
> > + }
> >
> >
> > Again, why the extra MOVs? Why not
> >
> > bld.MOV(subscript(offset(tmp, bld, i / 2), src.type, i % 2),
> component_i);
> >
> > instead of the extra MOVs?
>
> The same previous issue applies here, but I've seen that at commit
> 030d2b5016360caf44ebfa3f6951a6d676316a89
> i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write you
> changed the original behavior of this function.
>
> "All callers of this function allocate a fs_reg expressly to pass into
> it. It's much easier if we just let the helper allocate the register.
> While we're here, we switch it to doing the MOVs with an integer type so
> that we don't accidentally canonicalize floats on half of a double."
>
> In this case we only have one call where src and dst are the same at
> nir_intrinsic_store_output at nir_emit_tcs_intrinsic. Maybe I can
> re-factor the code following your approach.
>
I think it's fine to just let them handle in-place shuffles. Please drop a
comment to that effect on both of them so that it's more clear why the
temporary is needed. With that,
Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> Chema
>
> >
> > --Jason
> >
> >
> > + }
> > + if (components % 2) {
> > + bld.MOV(offset(dst, bld, components / 2), tmp);
> > + }
> > +}
> > +
> > +
> > fs_reg
> > setup_imm_df(const fs_builder &bld, double v)
> > {
> > --
> > 2.14.3
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.
> freedesktop.org>
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171201/ab4d3033/attachment.html>
More information about the mesa-dev
mailing list