[Mesa-dev] [PATCH] gallivm: fix [IU]MUL_HI regression harder

Wed Nov 9 16:43:33 UTC 2016

Am 09.11.2016 um 16:26 schrieb Nicolai Hähnle:
> On 09.11.2016 16:10, Roland Scheidegger wrote:
>> Am 09.11.2016 um 14:55 schrieb Nicolai Hähnle:
>>> On 09.11.2016 14:46, Roland Scheidegger wrote:
>>>> Reviewed-by: Roland Scheidegger <sroland at vmware.com>
>>>>
>>>> I'm curious though, is for radeonsi zext not equivalent to interleaving
>>>> the low 32bits of each number with zeros (and hence doing the a
>>>> uninterleave doesn't give you back the low respectively high bits)?
>>>
>>> radeonsi ends up calling lp_build_mul_32_lohi once for each component of
>>> the vector, with no vector types involved at all, it's all plain i32 and
>>> i64. This is better for the CodeGen backend in LLVM.
>>>
>>> The code was previously returning a <1 x i32> for each component, and
>>> that caused the later stages of opcode handling to get confused about
>>> what to do with the result.
>> So casting the result to a scalar would have fixed this too? This is
>> actually sort of a deficiency of lp_build_uninterleave1 - the
>> lp_build_xx code usually is built to avoid 1xn vector types and use
>> scalars in that case, but not quite everywhere.
> 
> I think casting to scalars could work as well. If that would be
> preferable for llvmpipe, I can give it shot.
That's not necessary, I was just curious why it didn't work. The code is
going to be unused mostly on x86, and even if it's used (on x86 or
elsewhere) the difference in quality in the generated code is probably
pretty minimal (and it can go either way).
Albeit at some point probably the vec/scalar lp_build_xx handling should
be made consistent everywhere.

Roland


> 
> Nicolai
> 
>>
>> Roland
>>
>>
>>>
>>> Cheers,
>>> Nicolai
>>>
>>>>
>>>> Roland
>>>>
>>>> Am 09.11.2016 um 12:46 schrieb Nicolai Hähnle:
>>>>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>>>>
>>>>> The fix in commit 88f791db75e9f065bac8134e0937e1b76600aa36 was
>>>>> insufficient
>>>>> for radeonsi because the vector case was not handled properly. It
>>>>> seems
>>>>> piglit only covers the scalar case, unfortunately.
>>>>>
>>>>> Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.*
>>>>> ---
>>>>>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 20 ++++++++++++--------
>>>>>  1 file changed, 12 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>>>> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>>>> index 43ad238..5553cb1 100644
>>>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>>>> @@ -1230,42 +1230,46 @@ lp_build_mul_32_lohi_cpu(struct
>>>>> lp_build_context *bld,
>>>>>   * Emits generic code.
>>>>>   */
>>>>>  LLVMValueRef
>>>>>  lp_build_mul_32_lohi(struct lp_build_context *bld,
>>>>>                       LLVMValueRef a,
>>>>>                       LLVMValueRef b,
>>>>>                       LLVMValueRef *res_hi)
>>>>>  {
>>>>>     struct gallivm_state *gallivm = bld->gallivm;
>>>>>     LLVMBuilderRef builder = gallivm->builder;
>>>>> -   LLVMValueRef tmp;
>>>>> +   LLVMValueRef tmp, shift, res_lo;
>>>>>     struct lp_type type_tmp;
>>>>> -   LLVMTypeRef wide_type, cast_type;
>>>>> +   LLVMTypeRef wide_type, narrow_type;
>>>>>
>>>>>     type_tmp = bld->type;
>>>>> +   narrow_type = lp_build_vec_type(gallivm, type_tmp);
>>>>>     type_tmp.width *= 2;
>>>>>     wide_type = lp_build_vec_type(gallivm, type_tmp);
>>>>> -   type_tmp = bld->type;
>>>>> -   type_tmp.length *= 2;
>>>>> -   cast_type = lp_build_vec_type(gallivm, type_tmp);
>>>>> +   shift = lp_build_const_vec(gallivm, type_tmp, 32);
>>>>>
>>>>>     if (bld->type.sign) {
>>>>>        a = LLVMBuildSExt(builder, a, wide_type, "");
>>>>>        b = LLVMBuildSExt(builder, b, wide_type, "");
>>>>>     } else {
>>>>>        a = LLVMBuildZExt(builder, a, wide_type, "");
>>>>>        b = LLVMBuildZExt(builder, b, wide_type, "");
>>>>>     }
>>>>>     tmp = LLVMBuildMul(builder, a, b, "");
>>>>> -   tmp = LLVMBuildBitCast(builder, tmp, cast_type, "");
>>>>> -   *res_hi = lp_build_uninterleave1(gallivm, bld->type.length * 2,
>>>>> tmp, 1);
>>>>> -   return lp_build_uninterleave1(gallivm, bld->type.length * 2, tmp,
>>>>> 0);
>>>>> +
>>>>> +   res_lo = LLVMBuildTrunc(builder, tmp, narrow_type, "");
>>>>> +
>>>>> +   /* Since we truncate anyway, LShr and AShr are equivalent. */
>>>>> +   tmp = LLVMBuildLShr(builder, tmp, shift, "");
>>>>> +   *res_hi = LLVMBuildTrunc(builder, tmp, narrow_type, "");
>>>>> +
>>>>> +   return res_lo;
>>>>>  }
>>>>>
>>>>>
>>>>>  /* a * b + c */
>>>>>  LLVMValueRef
>>>>>  lp_build_mad(struct lp_build_context *bld,
>>>>>               LLVMValueRef a,
>>>>>               LLVMValueRef b,
>>>>>               LLVMValueRef c)
>>>>>  {
>>>>>
>>>>
>>