[Mesa-dev] [PATCH v4 22/44] i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components
Chema Casanova
jmcasanova at igalia.com
Thu Nov 30 23:41:50 UTC 2017
On 30/11/17 23:21, Jason Ekstrand wrote:
> On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo
> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
>
> This helpers are used to load/store 16-bit types from/to 32-bit
> components.
>
> The functions shuffle_32bit_load_result_to_16bit_data and
> shuffle_16bit_data_for_32bit_write are implemented in a similar
> way than the analogous functions for handling 64-bit types.
> ---
> src/intel/compiler/brw_fs.h | 11 +++++++++
> src/intel/compiler/brw_fs_nir.cpp | 51
> +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
> index 19b897e7a9..30557324d5 100644
> --- a/src/intel/compiler/brw_fs.h
> +++ b/src/intel/compiler/brw_fs.h
> @@ -497,6 +497,17 @@ void
> shuffle_32bit_load_result_to_64bit_data(const brw::fs_builder &bld,
> fs_reg shuffle_64bit_data_for_32bit_write(const brw::fs_builder &bld,
> const fs_reg &src,
> uint32_t components);
> +
> +void shuffle_32bit_load_result_to_16bit_data(const brw::fs_builder
> &bld,
> + const fs_reg &dst,
> + const fs_reg &src,
> + uint32_t components);
> +
> +void shuffle_16bit_data_for_32bit_write(const brw::fs_builder &bld,
> + const fs_reg &dst,
> + const fs_reg &src,
> + uint32_t components);
> +
> fs_reg setup_imm_df(const brw::fs_builder &bld,
> double v);
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index 726b2fcee7..c091241132 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -4828,6 +4828,33 @@ shuffle_32bit_load_result_to_64bit_data(const
> fs_builder &bld,
> }
> }
>
> +void
> +shuffle_32bit_load_result_to_16bit_data(const fs_builder &bld,
> + const fs_reg &dst,
> + const fs_reg &src,
> + uint32_t components)
> +{
> + assert(type_sz(src.type) == 4);
> + assert(type_sz(dst.type) == 2);
> +
> + fs_reg tmp = retype(bld.vgrf(src.type), dst.type);
> +
> + for (unsigned i = 0; i < components; i++) {
> + const fs_reg component_i = subscript(offset(src, bld, i / 2),
> dst.type, i % 2);
> +
> + bld.MOV(offset(tmp, bld, i % 2), component_i);
> +
> + if (i % 2) {
> + bld.MOV(offset(dst, bld, i -1), offset(tmp, bld, 0));
> + bld.MOV(offset(dst, bld, i), offset(tmp, bld, 1));
> + }
>
>
> I'm very confused by this extra moving. Why can't we just do
>
> bld.MOV(offset(dst, bld, i), component_i);
>
> above? Maybe I'm missing something but I don't see why the extra moves
> are needed.
There is a comment in the previous function
shuffle_32bit_load_result_to_64bit_data that explains the similar
situation that still applies in shuffle_32bit_load_result_to_16bit_data,
what about including the following comment.
/* A temporary that we will use to shuffle the 16-bit data of each
* component in the vector into valid 32-bit data. We can't write
* directly to dst because dst can be the same as src and in that
* case the first MOV in the loop below would overwrite the data
* read in the second MOV.
*/
But in any case I've just checked, and at first sight at the 6 final
uses of this function this situation never happens.
>
>
> + }
> + if (components % 2) {
> + bld.MOV(offset(dst, bld, components - 1), tmp);
> + }
> +}
> +
> +
> /**
> * This helper does the inverse operation of
> * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
> @@ -4860,6 +4887,30 @@ shuffle_64bit_data_for_32bit_write(const
> fs_builder &bld,
> return dst;
> }
>
> +void
> +shuffle_16bit_data_for_32bit_write(const fs_builder &bld,
> + const fs_reg &dst,
> + const fs_reg &src,
> + uint32_t components)
> +{
> + assert(type_sz(src.type) == 2);
> + assert(type_sz(dst.type) == 4);
> +
> + fs_reg tmp = bld.vgrf(dst.type);
> +
> + for (unsigned i = 0; i < components; i++) {
> + const fs_reg component_i = offset(src, bld, i);
> + bld.MOV(subscript(tmp, src.type, i % 2), component_i);
> + if (i % 2) {
> + bld.MOV(offset(dst, bld, i / 2), tmp);
> + }
>
>
> Again, why the extra MOVs? Why not
>
> bld.MOV(subscript(offset(tmp, bld, i / 2), src.type, i % 2), component_i);
>
> instead of the extra MOVs?
The same previous issue applies here, but I've seen that at commit
030d2b5016360caf44ebfa3f6951a6d676316a89
i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write you
changed the original behavior of this function.
"All callers of this function allocate a fs_reg expressly to pass into
it. It's much easier if we just let the helper allocate the register.
While we're here, we switch it to doing the MOVs with an integer type so
that we don't accidentally canonicalize floats on half of a double."
In this case we only have one call where src and dst are the same at
nir_intrinsic_store_output at nir_emit_tcs_intrinsic. Maybe I can
re-factor the code following your approach.
Chema
>
> --Jason
>
>
> + }
> + if (components % 2) {
> + bld.MOV(offset(dst, bld, components / 2), tmp);
> + }
> +}
> +
> +
> fs_reg
> setup_imm_df(const fs_builder &bld, double v)
> {
> --
> 2.14.3
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>
>
More information about the mesa-dev
mailing list