[Mesa-dev] [PATCH] gallivm: fix [IU]MUL_HI regression harder

Wed Nov 9 15:10:24 UTC 2016

Am 09.11.2016 um 14:55 schrieb Nicolai Hähnle:
> On 09.11.2016 14:46, Roland Scheidegger wrote:
>> Reviewed-by: Roland Scheidegger <sroland at vmware.com>
>>
>> I'm curious though, is for radeonsi zext not equivalent to interleaving
>> the low 32bits of each number with zeros (and hence doing the a
>> uninterleave doesn't give you back the low respectively high bits)?
> 
> radeonsi ends up calling lp_build_mul_32_lohi once for each component of
> the vector, with no vector types involved at all, it's all plain i32 and
> i64. This is better for the CodeGen backend in LLVM.
> 
> The code was previously returning a <1 x i32> for each component, and
> that caused the later stages of opcode handling to get confused about
> what to do with the result.
So casting the result to a scalar would have fixed this too? This is
actually sort of a deficiency of lp_build_uninterleave1 - the
lp_build_xx code usually is built to avoid 1xn vector types and use
scalars in that case, but not quite everywhere.

Roland


> 
> Cheers,
> Nicolai
> 
>>
>> Roland
>>
>> Am 09.11.2016 um 12:46 schrieb Nicolai Hähnle:
>>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>>
>>> The fix in commit 88f791db75e9f065bac8134e0937e1b76600aa36 was
>>> insufficient
>>> for radeonsi because the vector case was not handled properly. It seems
>>> piglit only covers the scalar case, unfortunately.
>>>
>>> Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.*
>>> ---
>>>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 20 ++++++++++++--------
>>>  1 file changed, 12 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>> index 43ad238..5553cb1 100644
>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>> @@ -1230,42 +1230,46 @@ lp_build_mul_32_lohi_cpu(struct
>>> lp_build_context *bld,
>>>   * Emits generic code.
>>>   */
>>>  LLVMValueRef
>>>  lp_build_mul_32_lohi(struct lp_build_context *bld,
>>>                       LLVMValueRef a,
>>>                       LLVMValueRef b,
>>>                       LLVMValueRef *res_hi)
>>>  {
>>>     struct gallivm_state *gallivm = bld->gallivm;
>>>     LLVMBuilderRef builder = gallivm->builder;
>>> -   LLVMValueRef tmp;
>>> +   LLVMValueRef tmp, shift, res_lo;
>>>     struct lp_type type_tmp;
>>> -   LLVMTypeRef wide_type, cast_type;
>>> +   LLVMTypeRef wide_type, narrow_type;
>>>
>>>     type_tmp = bld->type;
>>> +   narrow_type = lp_build_vec_type(gallivm, type_tmp);
>>>     type_tmp.width *= 2;
>>>     wide_type = lp_build_vec_type(gallivm, type_tmp);
>>> -   type_tmp = bld->type;
>>> -   type_tmp.length *= 2;
>>> -   cast_type = lp_build_vec_type(gallivm, type_tmp);
>>> +   shift = lp_build_const_vec(gallivm, type_tmp, 32);
>>>
>>>     if (bld->type.sign) {
>>>        a = LLVMBuildSExt(builder, a, wide_type, "");
>>>        b = LLVMBuildSExt(builder, b, wide_type, "");
>>>     } else {
>>>        a = LLVMBuildZExt(builder, a, wide_type, "");
>>>        b = LLVMBuildZExt(builder, b, wide_type, "");
>>>     }
>>>     tmp = LLVMBuildMul(builder, a, b, "");
>>> -   tmp = LLVMBuildBitCast(builder, tmp, cast_type, "");
>>> -   *res_hi = lp_build_uninterleave1(gallivm, bld->type.length * 2,
>>> tmp, 1);
>>> -   return lp_build_uninterleave1(gallivm, bld->type.length * 2, tmp,
>>> 0);
>>> +
>>> +   res_lo = LLVMBuildTrunc(builder, tmp, narrow_type, "");
>>> +
>>> +   /* Since we truncate anyway, LShr and AShr are equivalent. */
>>> +   tmp = LLVMBuildLShr(builder, tmp, shift, "");
>>> +   *res_hi = LLVMBuildTrunc(builder, tmp, narrow_type, "");
>>> +
>>> +   return res_lo;
>>>  }
>>>
>>>
>>>  /* a * b + c */
>>>  LLVMValueRef
>>>  lp_build_mad(struct lp_build_context *bld,
>>>               LLVMValueRef a,
>>>               LLVMValueRef b,
>>>               LLVMValueRef c)
>>>  {
>>>
>>