[Mesa-dev] [PATCH 2/4] gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto

Wed Jan 4 16:17:30 UTC 2017

On 21/12/16 04:01, sroland at vmware.com wrote:
> From: Roland Scheidegger <sroland at vmware.com>
>
> If we only feed one source vector at a time, we cannot use pack intrinsics
> (as we only have a 64bit destination dst vector). lp_bld_conv_auto is
> specifically designed to alter the length and number of destination vectors,
> so this works just fine (if we use single source vectors at a time, afterwards
> we immediately reassemble the vectors).
> For AVX though this isn't really possible, since we expect 128bit output
> already for a single 256bit input. (One day we should handle AVX2 which again
> would need multiple inputs, however there's the problem that we get different
> ordered output there and we don't want to reorder, so would need to be able
> to tell build_conv to handle upper and lower halfs independently.)
> A similar strategy would probably work for 32->8bit too (if it doesn't hit
> the special case) but I'm going to try something different for that...
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_conv.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_conv.c b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
> index 69d24a5..c8f9c28 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_conv.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
> @@ -497,8 +497,25 @@ int lp_build_conv_auto(struct gallivm_state *gallivm,
>     if (src_type.width == dst_type->width) {
>        lp_build_conv(gallivm, src_type, *dst_type, src, num_srcs, dst, num_dsts);
>     } else {
> -      for (i = 0; i < num_srcs; ++i) {
> -         lp_build_conv(gallivm, src_type, *dst_type, &src[i], 1, &dst[i], 1);
> +      /*
> +       * If dst_width is 16 bits and src_width 32 and the dst vector size
> +       * 64bit, try feeding 2 vectors at once so pack intrinsics can be used.
> +       * (For AVX, this isn't needed, since we usually get 256bit src and
> +       * 128bit dst vectors which works ok. If we do AVX2 pack this should
> +       * be extended but need to be able to tell conversion code about pack
> +       * ordering first.)
> +       */
> +      unsigned ratio = 1;
> +      if (src_type.width == 2 * dst_type->width &&
> +          src_type.length == dst_type->length &&
> +          dst_type->floating == 0 && (num_srcs % 2 == 0) &&
> +          dst_type->width * dst_type->length == 64) {
> +         ratio = 2;
> +         num_dsts /= 2;
> +         dst_type->length *= 2;

Should this be inside lp_build_conv?

> +      }
> +      for (i = 0; i < num_dsts; i++) {
> +         lp_build_conv(gallivm, src_type, *dst_type, &src[i*ratio], ratio, &dst[i], 1);
>        }
>     }
>
>

Reviewed-by: Jose Fonseca <jfonseca at vmware.com>