<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Nov 30, 2017 at 3:41 PM, Chema Casanova <span dir="ltr"><<a href="mailto:jmcasanova@igalia.com" target="_blank">jmcasanova@igalia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
<br>
On 30/11/17 23:21, Jason Ekstrand wrote:<br>
> On Wed, Nov 29, 2017 at 6:50 PM, Jose Maria Casanova Crespo<br>
</span><div><div class="h5">> <<a href="mailto:jmcasanova@igalia.com">jmcasanova@igalia.com</a> <mailto:<a href="mailto:jmcasanova@igalia.com">jmcasanova@igalia.com</a>><wbr>> wrote:<br>
><br>
> This helpers are used to load/store 16-bit types from/to 32-bit<br>
> components.<br>
><br>
> The functions shuffle_32bit_load_result_to_<wbr>16bit_data and<br>
> shuffle_16bit_data_for_32bit_<wbr>write are implemented in a similar<br>
> way than the analogous functions for handling 64-bit types.<br>
> ---<br>
> src/intel/compiler/brw_fs.h | 11 +++++++++<br>
> src/intel/compiler/brw_fs_<wbr>nir.cpp | 51<br>
> ++++++++++++++++++++++++++++++<wbr>+++++++++<br>
> 2 files changed, 62 insertions(+)<br>
><br>
> diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h<br>
> index 19b897e7a9..30557324d5 100644<br>
> --- a/src/intel/compiler/brw_fs.h<br>
> +++ b/src/intel/compiler/brw_fs.h<br>
> @@ -497,6 +497,17 @@ void<br>
> shuffle_32bit_load_result_to_<wbr>64bit_data(const brw::fs_builder &bld,<br>
> fs_reg shuffle_64bit_data_for_32bit_<wbr>write(const brw::fs_builder &bld,<br>
> const fs_reg &src,<br>
> uint32_t components);<br>
> +<br>
> +void shuffle_32bit_load_result_to_<wbr>16bit_data(const brw::fs_builder<br>
> &bld,<br>
> + const fs_reg &dst,<br>
> + const fs_reg &src,<br>
> + uint32_t components);<br>
> +<br>
> +void shuffle_16bit_data_for_32bit_<wbr>write(const brw::fs_builder &bld,<br>
> + const fs_reg &dst,<br>
> + const fs_reg &src,<br>
> + uint32_t components);<br>
> +<br>
> fs_reg setup_imm_df(const brw::fs_builder &bld,<br>
> double v);<br>
><br>
> diff --git a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> index 726b2fcee7..c091241132 100644<br>
> --- a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> +++ b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> @@ -4828,6 +4828,33 @@ shuffle_32bit_load_result_to_<wbr>64bit_data(const<br>
> fs_builder &bld,<br>
> }<br>
> }<br>
><br>
> +void<br>
> +shuffle_32bit_load_result_to_<wbr>16bit_data(const fs_builder &bld,<br>
> + const fs_reg &dst,<br>
> + const fs_reg &src,<br>
> + uint32_t components)<br>
> +{<br>
> + assert(type_sz(src.type) == 4);<br>
> + assert(type_sz(dst.type) == 2);<br>
> +<br>
> + fs_reg tmp = retype(bld.vgrf(src.type), dst.type);<br>
> +<br>
> + for (unsigned i = 0; i < components; i++) {<br>
> + const fs_reg component_i = subscript(offset(src, bld, i / 2),<br>
> dst.type, i % 2);<br>
> +<br>
> + bld.MOV(offset(tmp, bld, i % 2), component_i);<br>
> +<br>
> + if (i % 2) {<br>
> + bld.MOV(offset(dst, bld, i -1), offset(tmp, bld, 0));<br>
> + bld.MOV(offset(dst, bld, i), offset(tmp, bld, 1));<br>
> + }<br>
><br>
><br>
> I'm very confused by this extra moving. Why can't we just do<br>
><br>
> bld.MOV(offset(dst, bld, i), component_i);<br>
><br>
> above? Maybe I'm missing something but I don't see why the extra moves<br>
> are needed.<br>
<br>
<br>
</div></div>There is a comment in the previous function<br>
shuffle_32bit_load_result_to_<wbr>64bit_data that explains the similar<br>
situation that still applies in shuffle_32bit_load_result_to_<wbr>16bit_data,<br>
what about including the following comment.<br>
<br>
/* A temporary that we will use to shuffle the 16-bit data of each<br>
* component in the vector into valid 32-bit data. We can't write<br>
* directly to dst because dst can be the same as src and in that<br>
* case the first MOV in the loop below would overwrite the data<br>
* read in the second MOV.<br>
*/<br>
<br>
But in any case I've just checked, and at first sight at the 6 final<br>
uses of this function this situation never happens.<br>
<div><div class="h5"><br>
> <br>
><br>
> + }<br>
> + if (components % 2) {<br>
> + bld.MOV(offset(dst, bld, components - 1), tmp);<br>
> + }<br>
> +}<br>
> +<br>
> +<br>
> /**<br>
> * This helper does the inverse operation of<br>
> * SHUFFLE_32BIT_LOAD_RESULT_TO_<wbr>64BIT_DATA.<br>
> @@ -4860,6 +4887,30 @@ shuffle_64bit_data_for_32bit_<wbr>write(const<br>
> fs_builder &bld,<br>
> return dst;<br>
> }<br>
><br>
> +void<br>
> +shuffle_16bit_data_for_32bit_<wbr>write(const fs_builder &bld,<br>
> + const fs_reg &dst,<br>
> + const fs_reg &src,<br>
> + uint32_t components)<br>
> +{<br>
> + assert(type_sz(src.type) == 2);<br>
> + assert(type_sz(dst.type) == 4);<br>
> +<br>
> + fs_reg tmp = bld.vgrf(dst.type);<br>
> +<br>
> + for (unsigned i = 0; i < components; i++) {<br>
> + const fs_reg component_i = offset(src, bld, i);<br>
> + bld.MOV(subscript(tmp, src.type, i % 2), component_i);<br>
> + if (i % 2) {<br>
> + bld.MOV(offset(dst, bld, i / 2), tmp);<br>
> + }<br>
><br>
><br>
> Again, why the extra MOVs? Why not<br>
><br>
> bld.MOV(subscript(offset(tmp, bld, i / 2), src.type, i % 2), component_i);<br>
><br>
> instead of the extra MOVs?<br>
<br>
</div></div>The same previous issue applies here, but I've seen that at commit<br>
030d2b5016360caf44ebfa3f6951a6<wbr>d676316a89<br>
i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_<wbr>write you<br>
changed the original behavior of this function.<br>
<br>
"All callers of this function allocate a fs_reg expressly to pass into<br>
it. It's much easier if we just let the helper allocate the register.<br>
While we're here, we switch it to doing the MOVs with an integer type so<br>
that we don't accidentally canonicalize floats on half of a double."<br>
<br>
In this case we only have one call where src and dst are the same at<br>
nir_intrinsic_store_output at nir_emit_tcs_intrinsic. Maybe I can<br>
re-factor the code following your approach.<br></blockquote><div><br></div><div>I think it's fine to just let them handle in-place shuffles. Please drop a comment to that effect on both of them so that it's more clear why the temporary is needed. With that,</div><div><br></div><div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Chema<br>
<span class=""><br>
><br>
> --Jason<br>
> <br>
><br>
> + }<br>
> + if (components % 2) {<br>
> + bld.MOV(offset(dst, bld, components / 2), tmp);<br>
> + }<br>
> +}<br>
> +<br>
> +<br>
> fs_reg<br>
> setup_imm_df(const fs_builder &bld, double v)<br>
> {<br>
> --<br>
> 2.14.3<br>
><br>
> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
</span>> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> <mailto:<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.<wbr>freedesktop.org</a>><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
> <<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.<wbr>org/mailman/listinfo/mesa-dev</a>><br>
><br>
><br>
</blockquote></div><br></div></div>