[Mesa-dev] [PATCH] gallivm: avx-512 changes for texel fetcher

Thu Jan 18 22:46:53 UTC 2018

Am 18.01.2018 um 20:26 schrieb Kyriazis, George:
>> On Jan 18, 2018, at 1:10 PM, Roland Scheidegger <sroland at vmware.com
>> <mailto:sroland at vmware.com>> wrote:
>>
>> Am 17.01.2018 um 23:33 schrieb George Kyriazis:
>>> The texture swizzle was not doing the right thing for avx512-style
>>> 16-wide loads.
>>>
>>> Special-case the post-load swizzle operations for avx512 so that we move
>>> the xyzw components correctly to the outputs.
>>>
>>> cc: Jose Fonseca <jfonseca at vmware.com <mailto:jfonseca at vmware.com>>
>>> ---
>>> src/gallium/auxiliary/gallivm/lp_bld_pack.c | 40
>>> +++++++++++++++++++++++++++--
>>> 1 file changed, 38 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
>>> b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
>>> index e8d4fcd..7879826 100644
>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
>>> @@ -129,6 +129,31 @@ lp_build_const_unpack_shuffle_half(struct
>>> gallivm_state *gallivm,
>>> }
>>>
>>> /**
>>> + * Similar to lp_build_const_unpack_shuffle_half, but for AVX512
>>> + * See comment above lp_build_interleave2_half for more details.
>>> + */
>>> +static LLVMValueRef
>>> +lp_build_const_unpack_shuffle_16wide(struct gallivm_state *gallivm,
>>> +                                     unsigned lo_hi)
>>> +{
>>> +   LLVMValueRef elems[LP_MAX_VECTOR_LENGTH];
>>> +   unsigned i, j;
>>> +
>>> +   assert(lo_hi < 2);
>>> +
>>> +   // for the following lo_hi setting, convert 0 -> f to:
>>> +   // 0: 0 16 4 20  8 24 12 28 1 17 5 21  9 25 13 29
>>> +   // 1: 2 18 6 22 10 26 14 30 3 19 7 23 11 27 15 31
>>> +   for (i = 0; i < 16; i++) {
>>> +      j = ((i&0x06)<<1) + ((i&1)<<4) + (i>>3) + (lo_hi<<1);
>>> +
>>> +      elems[i] = lp_build_const_int32(gallivm, j);
>>> +   }
>>> +
>>> +   return LLVMConstVector(elems, 16);
>>> +}
>>> +
>>> +/**
>>>  * Build shuffle vectors that match PACKxx (SSE) instructions or
>>>  * VPERM (Altivec).
>>>  */
>>> @@ -325,8 +350,8 @@ lp_build_interleave2(struct gallivm_state *gallivm,
>>> }
>>>
>>> /**
>>> - * Interleave vector elements but with 256 bit,
>>> - * treats it as interleave with 2 concatenated 128 bit vectors.
>>> + * Interleave vector elements but with 256 (or 512) bit,
>>> + * treats it as interleave with 2 concatenated 128 (or 256) bit vectors.
>>>  *
>>>  * This differs to lp_build_interleave2 as that function would do the
>>> following (for lo):
>>>  * a0 b0 a1 b1 a2 b2 a3 b3, and this does not compile into an AVX
>>> unpack instruction.
>>> @@ -343,6 +368,14 @@ lp_build_interleave2(struct gallivm_state *gallivm,
>>>  *
>>>  * And interleave-hi would result in:
>>>  *   a2 b2 a3 b3 a6 b6 a7 b7
>>> + *
>>> + * For 512 bits, the following are true:
>>> + *
>>> + * Interleave-lo would result in (capital letters denote hex indices):
>>> + *   a0 b0 a1 b1 a4 b4 a5 b5 a8 b8 a9 b9 aC bC aD bD
>>> + *
>>> + * Interleave-hi would result in:
>>> + *   a2 b2 a3 b3 a6 b6 a7 b7 aA bA aB bB aE bE aF bF
>>>  */
>>> LLVMValueRef
>>> lp_build_interleave2_half(struct gallivm_state *gallivm,
>>> @@ -354,6 +387,9 @@ lp_build_interleave2_half(struct gallivm_state
>>> *gallivm,
>>>    if (type.length * type.width == 256) {
>>>       LLVMValueRef shuffle =
>>> lp_build_const_unpack_shuffle_half(gallivm, type.length, lo_hi);
>>>       return LLVMBuildShuffleVector(gallivm->builder, a, b, shuffle, "");
>>> +   } else if ((type.length == 16) && (type.width == 32)) {
>>> +      LLVMValueRef shuffle =
>>> lp_build_const_unpack_shuffle_16wide(gallivm, lo_hi);
>>> +      return LLVMBuildShuffleVector(gallivm->builder, a, b, shuffle,
>>> "");
>> This is not really "interleave_half", more like "interleave_quarter"...
>> That said, avx512 certainly follows the same rules as avx256, so 128bit
>> pieces are treated independently. So maybe this should be renamed like
>> "interleave_native" or something like that.
>> Also, I believe it is definitely a mistake to restrict this to dword
>> interleaves here. You should handle all type widths, just like the
>> 256bit case can handle all widths.
>> And I'm not sure through which paths you reach this, but I'm not sure
>> why you don't need the corresponding unpack2_native and pack2_native
>> adjustments - it should not really be a special case, avx512 should
>> generally handle things like this (if you'd want to extend the gallivm
>> code to use avx512...). For that matter, the commit log and shortlog is
>> confusing, because this isn't directly related to texture fetching.
>>
> 
> Thanks Roland,
> 
> This check-in is a prerequisite for some avx-512 changes I am about to
> check in to the swr driver.  I’ve hit this for piglit test
> vs-texelfetch-isampler1d and similar ones that try to fetch from a
> texture from a VS.  You are right, there are probably other cases where
> this gets hit, but that was the place where I’ve hit it first.  If you
> could recommend some tests that hit the other width cases, it would be
> great to fix them too; since didn’t know where those cases would be hit,
> I decided to make this “surgical” change.
Ok, so it is hit from the transpose_aos path. That should work alright,
since this only ever deals with dword elements.
I guess to see it hit from other paths you'd need some more proper
support for 512bit vectors, e.g. pack/unpack_native.

> 
> My follow-on changes depend on this change for correctness, and I’d like
> to get the the other changes in before the mesa 18.0 date (tomorrow).
>  AVX512 support is an ongoing effort on our side, so they will
> definitely be additional check-ins along the same lines.
> 
> Thoughts on checking in as-is for 18.0, with upcoming changes slowly
> creeping in to fix additional issues, and make things more general?
I can live with that, it's your driver which might break,
llvmpipe shouldn't hit it :-).
As long as you fix up the commit message (should really mention the
transpose),
Reviewed-by: Roland Scheidegger <sroland at vmware.com>


> Thanks,
> 
> George
> 
> 
>> Roland
>>
>>
>>
>>
>>
>>>    } else {
>>>       return lp_build_interleave2(gallivm, type, a, b, lo_hi);
>>>    }
>>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=WSTeuSUPvmSGYKAu9XgzHNSXvuo4qh2MJ2USAEQygWQ&s=4yOW4nVoM0041Ves_N6YfrGtHDSNgprm6JdNpLmzMKI&e=>
>