[Mesa-dev] [PATCH 2/4] gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto
Jose Fonseca
jfonseca at vmware.com
Wed Jan 4 16:17:30 UTC 2017
On 21/12/16 04:01, sroland at vmware.com wrote:
> From: Roland Scheidegger <sroland at vmware.com>
>
> If we only feed one source vector at a time, we cannot use pack intrinsics
> (as we only have a 64bit destination dst vector). lp_bld_conv_auto is
> specifically designed to alter the length and number of destination vectors,
> so this works just fine (if we use single source vectors at a time, afterwards
> we immediately reassemble the vectors).
> For AVX though this isn't really possible, since we expect 128bit output
> already for a single 256bit input. (One day we should handle AVX2 which again
> would need multiple inputs, however there's the problem that we get different
> ordered output there and we don't want to reorder, so would need to be able
> to tell build_conv to handle upper and lower halfs independently.)
> A similar strategy would probably work for 32->8bit too (if it doesn't hit
> the special case) but I'm going to try something different for that...
> ---
> src/gallium/auxiliary/gallivm/lp_bld_conv.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_conv.c b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
> index 69d24a5..c8f9c28 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_conv.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
> @@ -497,8 +497,25 @@ int lp_build_conv_auto(struct gallivm_state *gallivm,
> if (src_type.width == dst_type->width) {
> lp_build_conv(gallivm, src_type, *dst_type, src, num_srcs, dst, num_dsts);
> } else {
> - for (i = 0; i < num_srcs; ++i) {
> - lp_build_conv(gallivm, src_type, *dst_type, &src[i], 1, &dst[i], 1);
> + /*
> + * If dst_width is 16 bits and src_width 32 and the dst vector size
> + * 64bit, try feeding 2 vectors at once so pack intrinsics can be used.
> + * (For AVX, this isn't needed, since we usually get 256bit src and
> + * 128bit dst vectors which works ok. If we do AVX2 pack this should
> + * be extended but need to be able to tell conversion code about pack
> + * ordering first.)
> + */
> + unsigned ratio = 1;
> + if (src_type.width == 2 * dst_type->width &&
> + src_type.length == dst_type->length &&
> + dst_type->floating == 0 && (num_srcs % 2 == 0) &&
> + dst_type->width * dst_type->length == 64) {
> + ratio = 2;
> + num_dsts /= 2;
> + dst_type->length *= 2;
Should this be inside lp_build_conv?
> + }
> + for (i = 0; i < num_dsts; i++) {
> + lp_build_conv(gallivm, src_type, *dst_type, &src[i*ratio], ratio, &dst[i], 1);
> }
> }
>
>
Reviewed-by: Jose Fonseca <jfonseca at vmware.com>
More information about the mesa-dev
mailing list