[Mesa-dev] [PATCH] gallivm: avx-512 changes for texel fetcher

Kyriazis, George george.kyriazis at intel.com
Thu Jan 18 19:46:40 UTC 2018


On Jan 18, 2018, at 1:10 PM, Roland Scheidegger <sroland at vmware.com<mailto:sroland at vmware.com>> wrote:

Am 17.01.2018 um 23:33 schrieb George Kyriazis:
The texture swizzle was not doing the right thing for avx512-style
16-wide loads.

Special-case the post-load swizzle operations for avx512 so that we move
the xyzw components correctly to the outputs.

cc: Jose Fonseca <jfonseca at vmware.com<mailto:jfonseca at vmware.com>>
---
src/gallium/auxiliary/gallivm/lp_bld_pack.c | 40 +++++++++++++++++++++++++++--
1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
index e8d4fcd..7879826 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
@@ -129,6 +129,31 @@ lp_build_const_unpack_shuffle_half(struct gallivm_state *gallivm,
}

/**
+ * Similar to lp_build_const_unpack_shuffle_half, but for AVX512
+ * See comment above lp_build_interleave2_half for more details.
+ */
+static LLVMValueRef
+lp_build_const_unpack_shuffle_16wide(struct gallivm_state *gallivm,
+                                     unsigned lo_hi)
+{
+   LLVMValueRef elems[LP_MAX_VECTOR_LENGTH];
+   unsigned i, j;
+
+   assert(lo_hi < 2);
+
+   // for the following lo_hi setting, convert 0 -> f to:
+   // 0: 0 16 4 20  8 24 12 28 1 17 5 21  9 25 13 29
+   // 1: 2 18 6 22 10 26 14 30 3 19 7 23 11 27 15 31
+   for (i = 0; i < 16; i++) {
+      j = ((i&0x06)<<1) + ((i&1)<<4) + (i>>3) + (lo_hi<<1);
+
+      elems[i] = lp_build_const_int32(gallivm, j);
+   }
+
+   return LLVMConstVector(elems, 16);
+}
+
+/**
 * Build shuffle vectors that match PACKxx (SSE) instructions or
 * VPERM (Altivec).
 */
@@ -325,8 +350,8 @@ lp_build_interleave2(struct gallivm_state *gallivm,
}

/**
- * Interleave vector elements but with 256 bit,
- * treats it as interleave with 2 concatenated 128 bit vectors.
+ * Interleave vector elements but with 256 (or 512) bit,
+ * treats it as interleave with 2 concatenated 128 (or 256) bit vectors.
 *
 * This differs to lp_build_interleave2 as that function would do the following (for lo):
 * a0 b0 a1 b1 a2 b2 a3 b3, and this does not compile into an AVX unpack instruction.
@@ -343,6 +368,14 @@ lp_build_interleave2(struct gallivm_state *gallivm,
 *
 * And interleave-hi would result in:
 *   a2 b2 a3 b3 a6 b6 a7 b7
+ *
+ * For 512 bits, the following are true:
+ *
+ * Interleave-lo would result in (capital letters denote hex indices):
+ *   a0 b0 a1 b1 a4 b4 a5 b5 a8 b8 a9 b9 aC bC aD bD
+ *
+ * Interleave-hi would result in:
+ *   a2 b2 a3 b3 a6 b6 a7 b7 aA bA aB bB aE bE aF bF
 */
LLVMValueRef
lp_build_interleave2_half(struct gallivm_state *gallivm,
@@ -354,6 +387,9 @@ lp_build_interleave2_half(struct gallivm_state *gallivm,
   if (type.length * type.width == 256) {
      LLVMValueRef shuffle = lp_build_const_unpack_shuffle_half(gallivm, type.length, lo_hi);
      return LLVMBuildShuffleVector(gallivm->builder, a, b, shuffle, "");
+   } else if ((type.length == 16) && (type.width == 32)) {
+      LLVMValueRef shuffle = lp_build_const_unpack_shuffle_16wide(gallivm, lo_hi);
+      return LLVMBuildShuffleVector(gallivm->builder, a, b, shuffle, "");
This is not really "interleave_half", more like "interleave_quarter"...
That said, avx512 certainly follows the same rules as avx256, so 128bit
pieces are treated independently. So maybe this should be renamed like
"interleave_native" or something like that.
Also, I believe it is definitely a mistake to restrict this to dword
interleaves here. You should handle all type widths, just like the
256bit case can handle all widths.
And I'm not sure through which paths you reach this, but I'm not sure
why you don't need the corresponding unpack2_native and pack2_native
adjustments - it should not really be a special case, avx512 should
generally handle things like this (if you'd want to extend the gallivm
code to use avx512...). For that matter, the commit log and shortlog is
confusing, because this isn't directly related to texture fetching.

Roland

Roland,

The stack trace that I am seeing is the following:

(gdb) bt
#0  lp_build_const_unpack_shuffle_16wide (gallivm=0x168b690, lo_hi=0)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_pack.c:138
#1  0x00007ffff62786de in lp_build_interleave2_half (gallivm=0x168b690,
    type=..., a=0x16a7378, b=0x16a7d38, lo_hi=0)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_pack.c:391
#2  0x00007ffff629585f in lp_build_transpose_aos (gallivm=0x168b690,
    single_type_lp=..., src=0x7fffffff32e0, dst=0x7fffffff3300)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_swizzle.c:664
#3  0x00007ffff626a887 in lp_build_fetch_rgba_soa (gallivm=0x168b690,
    format_desc=0x7ffff67fe9a0 <util_format_r32g32b32a32_sint_description>,
    type=..., aligned=1 '\001', base_ptr=0x16a3218, offset=0x16a6890,
    i=0xf87a90, j=0xf87a90, cache=0x0, rgba_out=0x7fffffff4280)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_format_soa.c:635
#4  0x00007ffff628f899 in lp_build_fetch_texel (bld=0x7fffffff3680,
    texture_unit=0, coords=0x7fffffff4060, explicit_lod=0x16a2bf0,
    offsets=0x7fffffff4260, colors_out=0x7fffffff4280)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:2682
#5  0x00007ffff6290a6b in lp_build_sample_soa_code (gallivm=0x168b690,
    static_texture_state=0x7fffffffc61c, static_sampler_state=0x7fffffffc618,
    dynamic_state=0x1696d18, type=..., sample_key=100, texture_index=0,
    sampler_index=0, context_ptr=0x16a2b70, thread_data_ptr=0x0,
    coords=0x7fffffff42a0, offsets=0x7fffffff4260, derivs=0x0, lod=0x16a2bf0,
    texel_out=0x7fffffff4280)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:3092
#6  0x00007ffff629202d in lp_build_sample_gen_func (gallivm=0x168b690,
    static_texture_state=0x7fffffffc61c, static_sampler_state=0x7fffffffc618,
    dynamic_state=0x1696d18, type=..., texture_index=0, sampler_index=0,
    function=0x16a2aa8, num_args=3, sample_key=100)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:3483
#7  0x00007ffff629286d in lp_build_sample_soa_func (gallivm=0x168b690,
    static_texture_state=0x7fffffffc61c, static_sampler_state=0x7fffffffc618,
    dynamic_state=0x1696d18, params=0x7fffffff46b0)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:3629
---Type <return> to continue, or q <return> to quit---
#8  0x00007ffff6292cdb in lp_build_sample_soa (
    static_texture_state=0x7fffffffc61c, static_sampler_state=0x7fffffffc618,
    dynamic_state=0x1696d18, gallivm=0x168b690, params=0x7fffffff46b0)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:3734
#9  0x00007ffff630fd71 in swr_sampler_soa_emit_fetch_texel (base=0x1696d00,
    gallivm=0x168b690, params=0x7fffffff46b0)
    at ../../../../../src/gallium/drivers/swr/swr_tex_sample.cpp:302
#10 0x00007ffff62a3fc0 in emit_fetch_texels (bld=0x7fffffff4a40,
    inst=0x1698b20, texel=0x7fffffff4868, is_samplei=0 '\000')
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c:2523
#11 0x00007ffff62a584d in txf_emit (action=0x7fffffff54c0,
    bld_base=0x7fffffff4a40, emit_data=0x7fffffff47f0)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c:3178
#12 0x00007ffff629bcaa in lp_build_tgsi_inst_llvm (bld_base=0x7fffffff4a40,
    inst=0x1698b20)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_tgsi.c:309
#13 0x00007ffff629c650 in lp_build_tgsi_llvm (bld_base=0x7fffffff4a40,
    tokens=0x168a5e0)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_tgsi.c:546
#14 0x00007ffff62a7255 in lp_build_tgsi_soa (gallivm=0x168b690,
    tokens=0x168a5e0, type=..., mask=0x0, consts_ptr=0x16913e8,
    const_sizes_ptr=0x16914a8, system_values=0x7fffffffac30,
    inputs=0x7fffffffacd0, outputs=0x7fffffffb6d0, context_ptr=0x1691300,
    thread_data_ptr=0x0, sampler=0x1696d00, info=0x1688ab8, gs_iface=0x0)
    at ../../../../src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c:3945
#15 0x00007ffff6315cfe in BuilderSWR::CompileVS (this=0x7fffffffc130,
    ctx=0x645300, key=...)
    at ../../../../../src/gallium/drivers/swr/swr_shader.cpp:836






   } else {
      return lp_build_interleave2(gallivm, type, a, b, lo_hi);
   }


_______________________________________________
mesa-dev mailing list
mesa-dev at lists.freedesktop.org<mailto:mesa-dev at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180118/63db0429/attachment-0001.html>


More information about the mesa-dev mailing list