<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Nov 30, 2017 at 3:41 PM, Chema Casanova <span dir="ltr"><<a href="mailto:jmcasanova@igalia.com" target="_blank">jmcasanova@igalia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
<br>
On 30/11/17 23:21, Jason Ekstrand wrote:<br>
> On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo<br>
</span><div><div class="h5">> <<a href="mailto:jmcasanova@igalia.com">jmcasanova@igalia.com</a> <mailto:<a href="mailto:jmcasanova@igalia.com">jmcasanova@igalia.com</a>><wbr>> wrote:<br>
><br>
>     This helpers are used to load/store 16-bit types from/to 32-bit<br>
>     components.<br>
><br>
>     The functions shuffle_32bit_load_result_to_<wbr>16bit_data and<br>
>     shuffle_16bit_data_for_32bit_<wbr>write are implemented in a similar<br>
>     way than the analogous functions for handling 64-bit types.<br>
>     ---<br>
>      src/intel/compiler/brw_fs.h       | 11 +++++++++<br>
>      src/intel/compiler/brw_fs_<wbr>nir.cpp | 51<br>
>     ++++++++++++++++++++++++++++++<wbr>+++++++++<br>
>      2 files changed, 62 insertions(+)<br>
><br>
>     diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h<br>
>     index 19b897e7a9..30557324d5 100644<br>
>     --- a/src/intel/compiler/brw_fs.h<br>
>     +++ b/src/intel/compiler/brw_fs.h<br>
>     @@ -497,6 +497,17 @@ void<br>
>     shuffle_32bit_load_result_to_<wbr>64bit_data(const brw::fs_builder &bld,<br>
>      fs_reg shuffle_64bit_data_for_32bit_<wbr>write(const brw::fs_builder &bld,<br>
>                                                const fs_reg &src,<br>
>                                                uint32_t components);<br>
>     +<br>
>     +void shuffle_32bit_load_result_to_<wbr>16bit_data(const brw::fs_builder<br>
>     &bld,<br>
>     +                                             const fs_reg &dst,<br>
>     +                                             const fs_reg &src,<br>
>     +                                             uint32_t components);<br>
>     +<br>
>     +void shuffle_16bit_data_for_32bit_<wbr>write(const brw::fs_builder &bld,<br>
>     +                                        const fs_reg &dst,<br>
>     +                                        const fs_reg &src,<br>
>     +                                        uint32_t components);<br>
>     +<br>
>      fs_reg setup_imm_df(const brw::fs_builder &bld,<br>
>                          double v);<br>
><br>
>     diff --git a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
>     b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
>     index 726b2fcee7..c091241132 100644<br>
>     --- a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
>     +++ b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
>     @@ -4828,6 +4828,33 @@ shuffle_32bit_load_result_to_<wbr>64bit_data(const<br>
>     fs_builder &bld,<br>
>         }<br>
>      }<br>
><br>
>     +void<br>
>     +shuffle_32bit_load_result_to_<wbr>16bit_data(const fs_builder &bld,<br>
>     +                                        const fs_reg &dst,<br>
>     +                                        const fs_reg &src,<br>
>     +                                        uint32_t components)<br>
>     +{<br>
>     +   assert(type_sz(src.type) == 4);<br>
>     +   assert(type_sz(dst.type) == 2);<br>
>     +<br>
>     +   fs_reg tmp = retype(bld.vgrf(src.type), dst.type);<br>
>     +<br>
>     +   for (unsigned i = 0; i < components; i++) {<br>
>     +      const fs_reg component_i = subscript(offset(src, bld, i / 2),<br>
>     dst.type, i % 2);<br>
>     +<br>
>     +      bld.MOV(offset(tmp, bld, i % 2), component_i);<br>
>     +<br>
>     +      if (i % 2) {<br>
>     +         bld.MOV(offset(dst, bld, i -1), offset(tmp, bld, 0));<br>
>     +         bld.MOV(offset(dst, bld, i), offset(tmp, bld, 1));<br>
>     +      }<br>
><br>
><br>
> I'm very confused by this extra moving.  Why can't we just do<br>
><br>
> bld.MOV(offset(dst, bld, i), component_i);<br>
><br>
> above?  Maybe I'm missing something but I don't see why the extra moves<br>
> are needed.<br>
<br>
<br>
</div></div>There is a comment in the previous function<br>
shuffle_32bit_load_result_to_<wbr>64bit_data that explains the similar<br>
situation that still applies in shuffle_32bit_load_result_to_<wbr>16bit_data,<br>
what about including the following comment.<br>
<br>
/* A temporary that we will use to shuffle the 16-bit data of each<br>
 * component in the vector into valid 32-bit data. We can't write<br>
 * directly to dst because dst can be the same as src and in that<br>
 * case the first MOV in the loop below would overwrite the data<br>
 * read in the second MOV.<br>
 */<br>
<br>
But in any case I've just checked, and at first sight at the 6 final<br>
uses of this function this situation never happens.<br>
<div><div class="h5"><br>
>  <br>
><br>
>     +   }<br>
>     +   if (components % 2) {<br>
>     +      bld.MOV(offset(dst, bld, components - 1), tmp);<br>
>     +   }<br>
>     +}<br>
>     +<br>
>     +<br>
>      /**<br>
>       * This helper does the inverse operation of<br>
>       * SHUFFLE_32BIT_LOAD_RESULT_TO_<wbr>64BIT_DATA.<br>
>     @@ -4860,6 +4887,30 @@ shuffle_64bit_data_for_32bit_<wbr>write(const<br>
>     fs_builder &bld,<br>
>         return dst;<br>
>      }<br>
><br>
>     +void<br>
>     +shuffle_16bit_data_for_32bit_<wbr>write(const fs_builder &bld,<br>
>     +                                   const fs_reg &dst,<br>
>     +                                   const fs_reg &src,<br>
>     +                                   uint32_t components)<br>
>     +{<br>
>     +   assert(type_sz(src.type) == 2);<br>
>     +   assert(type_sz(dst.type) == 4);<br>
>     +<br>
>     +   fs_reg tmp = bld.vgrf(dst.type);<br>
>     +<br>
>     +   for (unsigned i = 0; i < components; i++) {<br>
>     +      const fs_reg component_i = offset(src, bld, i);<br>
>     +      bld.MOV(subscript(tmp, src.type, i % 2), component_i);<br>
>     +      if (i % 2) {<br>
>     +         bld.MOV(offset(dst, bld, i / 2), tmp);<br>
>     +      }<br>
><br>
><br>
> Again, why the extra MOVs?  Why not<br>
><br>
> bld.MOV(subscript(offset(tmp, bld, i / 2), src.type, i % 2), component_i);<br>
><br>
> instead of the extra MOVs?<br>
<br>
</div></div>The same previous issue applies here, but I've seen that at commit<br>
030d2b5016360caf44ebfa3f6951a6<wbr>d676316a89<br>
i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_<wbr>write you<br>
changed the original behavior of this function.<br>
<br>
"All callers of this function allocate a fs_reg expressly to pass into<br>
it.  It's much easier if we just let the helper allocate the register.<br>
While we're here, we switch it to doing the MOVs with an integer type so<br>
that we don't accidentally canonicalize floats on half of a double."<br>
<br>
In this case we only have one call where src and dst are the same at<br>
nir_intrinsic_store_output at nir_emit_tcs_intrinsic. Maybe I can<br>
re-factor the code following your approach.<br></blockquote><div><br></div><div>I think it's fine to just let them handle in-place shuffles.  Please drop a comment to that effect on both of them so that it's more clear why the temporary is needed.  With that,</div><div><br></div><div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Chema<br>
<span class=""><br>
><br>
> --Jason<br>
>  <br>
><br>
>     +   }<br>
>     +   if (components % 2) {<br>
>     +      bld.MOV(offset(dst, bld, components / 2), tmp);<br>
>     +   }<br>
>     +}<br>
>     +<br>
>     +<br>
>      fs_reg<br>
>      setup_imm_df(const fs_builder &bld, double v)<br>
>      {<br>
>     --<br>
>     2.14.3<br>
><br>
>     ______________________________<wbr>_________________<br>
>     mesa-dev mailing list<br>
</span>>     <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> <mailto:<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.<wbr>freedesktop.org</a>><br>
>     <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
>     <<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.<wbr>org/mailman/listinfo/mesa-dev</a>><br>
><br>
><br>
</blockquote></div><br></div></div>