<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jun 13, 2018 at 5:07 PM, Chema Casanova <span dir="ltr"><<a href="mailto:jmcasanova@igalia.com" target="_blank">jmcasanova@igalia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 13/06/18 22:46, Jason Ekstrand wrote:<br>
> On Sat, Jun 9, 2018 at 4:13 AM, Jose Maria Casanova Crespo<br>
</span><div><div class="h5">> <<a href="mailto:jmcasanova@igalia.com">jmcasanova@igalia.com</a> <mailto:<a href="mailto:jmcasanova@igalia.com">jmcasanova@igalia.com</a>><wbr>> wrote:<br>
> <br>
> This new function takes care of shuffle/unshuffle components of a<br>
> particular bit-size in components with a different bit-size.<br>
> <br>
> If source type size is smaller than destination type size the operation<br>
> needed is a component shuffle. The opposite case would be an unshuffle.<br>
> <br>
> The operation allows to skip first_component number of components from<br>
> the source.<br>
> <br>
> Shuffle MOVs are retyped using integer types avoiding problems with<br>
> denorms<br>
> and float types. This allows to simplify uses of shuffle functions<br>
> that are<br>
> dealing with these retypes individually.<br>
> <br>
> Now there is a new restriction so source and destination can not overlap<br>
> anymore when calling this suffle function. Following patches that<br>
> migrate<br>
> to use this new function will take care individually of avoiding source<br>
> and destination overlaps.<br>
> ---<br>
> src/intel/compiler/brw_fs_<wbr>nir.cpp | 92 ++++++++++++++++++++++++++++++<wbr>+<br>
> 1 file changed, 92 insertions(+)<br>
> <br>
> diff --git a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> index 166da0aa6d7..1a9d3c41d1d 100644<br>
> --- a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> +++ b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> @@ -5362,6 +5362,98 @@ shuffle_16bit_data_for_32bit_<wbr>write(const<br>
> fs_builder &bld,<br>
> }<br>
> }<br>
> <br>
> +/*<br>
> + * This helper takes a source register and un/shuffles it into the<br>
> destination<br>
> + * register.<br>
> + *<br>
> + * If source type size is smaller than destination type size the<br>
> operation<br>
> + * needed is a component shuffle. The opposite case would be an<br>
> unshuffle. If<br>
> + * source/destination type size is equal a shuffle is done that<br>
> would be<br>
> + * equivalent to a simple MOV.<br>
> <br>
> <br>
> There's a sticky bit here if we want this to work with 64-bit types on<br>
> gen7 and earlier because we only have DF there and not Q so the<br>
> brw_reg_type_from_bit_size below doesn't work. If we care about that<br>
> case (and I'm not convinced we do), it should be easy enough to add a<br>
> type_sz(src.type) == type_sz(dst.type) case which just does MOVs from<br>
> source to dest.<br>
<br>
</div></div>At this moment, current uses of this function are to read from 32-bits<br>
or to write to 32-bit. But I think that for completeness if would be<br>
nice to have all cases covered. The option of doing the MOVs in the case<br>
of equality (that would be quite normal) saves us to do the shuffle<br>
calculus for the simple case. So I'm going for it.<br>
<div><div class="h5"><br>
> + *<br>
> + * For example, if source is a 16-bit type and destination is<br>
> 32-bit. A 3<br>
> + * components .xyz 16-bit vector on SIMD8 would be.<br>
> + *<br>
> + * |x1|x2|x3|x4|x5|x6|x7|x8|y1|<wbr>y2|y3|y4|y5|y6|y7|y8|<br>
> + * |z1|z2|z3|z4|z5|z6|z7|z8| | | | | | | | |<br>
> + *<br>
> + * This helper will return the following 2 32-bit components with<br>
> the 16-bit<br>
> + * values shuffled:<br>
> + *<br>
> + * |x1 y1|x2 y2|x3 y3|x4 y4|x5 y5|x6 y6|x7 y7|x8 y8|<br>
> + * |z1 |z2 |z3 |z4 |z5 |z6 |z7 |z8 |<br>
> + *<br>
> + * For unshuffle, the example would be the opposite, a 64-bit type<br>
> source<br>
> + * and a 32-bit destination. A 2 component .xy 64-bit vector on SIMD8<br>
> + * would be:<br>
> + *<br>
> + * | x1l x1h | x2l x2h | x3l x3h | x4l x4h |<br>
> + * | x5l x5h | x6l x6h | x7l x7h | x8l x8h |<br>
> + * | y1l y1h | y2l y2h | y3l y3h | y4l y4h |<br>
> + * | y5l y5h | y6l y6h | y7l y7h | y8l y8h |<br>
> + *<br>
> + * The returned result would be the following 4 32-bit components<br>
> unshuffled:<br>
> + *<br>
> + * | x1l | x2l | x3l | x4l | x5l | x6l | x7l | x8l |<br>
> + * | x1h | x2h | x3h | x4h | x5h | x6h | x7h | x8h |<br>
> + * | y1l | y2l | y3l | y4l | y5l | y6l | y7l | y8l |<br>
> + * | y1h | y2h | y3h | y4h | y5h | y6h | y7h | y8h |<br>
> + *<br>
> + * - Source and destination register must not be overlapped.<br>
> + * - first_component parameter allows skipping source components.<br>
> + */<br>
> +void<br>
> +shuffle_src_to_dst(const fs_builder &bld,<br>
> + const fs_reg &dst,<br>
> + const fs_reg &src,<br>
> + uint32_t first_component,<br>
> + uint32_t components)<br>
> +{<br>
> + if (type_sz(src.type) <= type_sz(dst.type)) {<br>
> + /* Source is shuffled into destination */<br>
> + unsigned size_ratio = type_sz(dst.type) / type_sz(src.type);<br>
> +#ifndef NDEBUG<br>
> + boolean src_dst_overlap = regions_overlap(dst,<br>
> + type_sz(dst.type) * bld.dispatch_width() * components,<br>
> + offset(src, bld, first_component * size_ratio),<br>
> <br>
> <br>
> Why do you need to multiply first_component by size_ratio? It's already<br>
> in units of source components.<br>
<br>
</div></div>Yes, that's wrong. I forgot to update this assertion with my lasts<br>
refactorings of this series, and as nowadays there are no overlaps I<br>
didn't realize of the error. Both calls to regions_overlap have all<br>
parameters wrong.<br>
<br>
I should also include in the comment than the "component" parameter is<br>
measured in terms of the smaller type between source and destination. As<br>
we are un/shuffling the smaller components from/into a bigger one.<span class=""><br></span></blockquote><div><br></div><div>Yes, please. Is that really what we want though? Do we have cases where we are going from a big type to a small type and want a small component offset? I suppose I could see it happening. It's ok if it is what we want; I just want to make sure. :-)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
> + type_sz(src.type) * bld.dispatch_width() * components *<br>
> size_ratio);<br>
> +#endif<br>
> + assert(!src_dst_overlap);<br>
> <br>
> <br>
> If the only thing you're doing with src_dst_overlap is to assert on it,<br>
> you may as well put the regions_overlap call inside the assert and drop<br>
> the #ifndef.<br>
<br>
</span>Ok, I just thought it was too long function. I'll change it.<br>
<span class=""><br>
> +<br>
> + brw_reg_type shuffle_type =<br>
> + brw_reg_type_from_bit_size(8 * type_sz(src.type),<br>
> + BRW_REGISTER_TYPE_D);<br>
> + for (unsigned i = 0; i < components; i++) {<br>
> + fs_reg shuffle_component_i =<br>
> + subscript(offset(dst, bld, i / size_ratio),<br>
> + shuffle_type, i % size_ratio);<br>
> + bld.MOV(shuffle_component_i,<br>
> + retype(offset(src, bld, i + first_component),<br>
> shuffle_type));<br>
> + }<br>
> + } else {<br>
> + /* Source is unshuffled into destination */<br>
> + unsigned size_ratio = type_sz(src.type) / type_sz(dst.type);<br>
> +#ifndef NDEBUG<br>
> + boolean src_dst_overlap = regions_overlap(dst,<br>
> + type_sz(dst.type) * bld.dispatch_width() * components,<br>
> + offset(src, bld, first_component / size_ratio),<br>
> <br>
> <br>
> Again, I don't think we want to divide here.<br>
<br>
</span>Sure.<br>
<span class=""><br>
> <br>
> <br>
> + type_sz(src.type) * bld.dispatch_width() *<br>
> + DIV_ROUND_UP(components + first_component % size_ratio,<br>
> size_ratio));<br>
> +#endif<br>
> + assert(!src_dst_overlap);<br>
> <br>
> <br>
> As I said above, everything can go in the assert.<br>
<br>
</span>Ok.<br>
<span class=""><br>
> Other than that, this looks great.<br>
<br>
</span>Thanks for the review. I'm sending tomorrow a v2 with the fixes.<br>
<span class=""><br>
> --Jason<br>
> <br>
> <br>
> <br>
> + brw_reg_type shuffle_type =<br>
> + brw_reg_type_from_bit_size(8 * type_sz(dst.type),<br>
> + BRW_REGISTER_TYPE_D);<br>
> + for (unsigned i = 0; i < components; i++) {<br>
> + fs_reg shuffle_component_i =<br>
> + subscript(offset(src, bld, (first_component + i) /<br>
> size_ratio),<br>
> + shuffle_type, (first_component + i) %<br>
> size_ratio);<br>
> + bld.MOV(retype(offset(dst, bld, i), shuffle_type),<br>
> + shuffle_component_i);<br>
> + }<br>
> + }<br>
> +}<br>
> +<br>
> fs_reg<br>
> setup_imm_df(const fs_builder &bld, double v)<br>
> {<br>
> -- <br>
> 2.17.1<br>
> <br>
> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
</span>> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> <mailto:<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.<wbr>freedesktop.org</a>><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
> <<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.<wbr>org/mailman/listinfo/mesa-dev</a>><br>
<div class="HOEnZb"><div class="h5">> <br>
> <br>
> <br>
> <br>
> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
> <br>
</div></div></blockquote></div><br></div></div>