[Mesa-dev] [PATCH] util: Fix fallback iround handling of integral inputs

Mon Jul 21 14:21:45 PDT 2014

Am 21.07.2014 17:53, schrieb Richard Sandiford:
> lp_build_iround has a fallback case that tries to emulate a round-to-nearest
> float->int conversion by adding 0.5 and using a truncating fptosi.  For odd
> numbers in the range [2^23, 2^24], which are already integral, this has
> the effect of adding 1 to the integer, since the result of adding 0.5 is
> exactly half way between two representable values and the tie goes to even.
> This includes the important special case of (float)0xffffff, which is the
> maximum depth in a z24s8 format.  I.e. calling iround on (float)0xffffff
> gives 0x1000000 rather than 0xffffff when the fallback is used.
> 
> This patch only uses the x+0.5 trick if the ulp of the input is fractional.
> This still doesn't give ties-to-even semantics, but that doesn't seem to
> matter much in practice.
> 
> Signed-off-by: Richard Sandiford <rsandifo at linux.vnet.ibm.com>
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 43 ++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> index 9ef8628..82ddb5a 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> @@ -2181,25 +2181,36 @@ lp_build_iround(struct lp_build_context *bld,
>        res = lp_build_round_arch(bld, a, LP_BUILD_ROUND_NEAREST);
>     }
>     else {
> -      LLVMValueRef half;
> +      struct lp_type int_type = lp_int_type(type);
> +      LLVMTypeRef vec_type = bld->vec_type;
> +      unsigned mantissa = lp_mantissa(type);
> +      unsigned expbits = type.width - mantissa - 1;
> +      unsigned bias = (1 << (expbits - 1)) - 1;
> +      LLVMValueRef mask = lp_build_const_int_vec(bld->gallivm, type,
> +                               (unsigned long long)1 << (type.width - 1));
> +      /* Smallest value with an integral ulp */
> +      LLVMValueRef abslimit = lp_build_const_int_vec(bld->gallivm, type,
> +                               (bias + mantissa) << mantissa);
> +      LLVMValueRef half, sign, absa, fractulp;
>  
>        half = lp_build_const_vec(bld->gallivm, type, 0.5);
>  
> -      if (type.sign) {
> -         LLVMTypeRef vec_type = bld->vec_type;
> -         LLVMValueRef mask = lp_build_const_int_vec(bld->gallivm, type,
> -                                    (unsigned long long)1 << (type.width - 1));
> -         LLVMValueRef sign;
> -
> -         /* get sign bit */
> -         sign = LLVMBuildBitCast(builder, a, int_vec_type, "");
> -         sign = LLVMBuildAnd(builder, sign, mask, "");
> -
> -         /* sign * 0.5 */
> -         half = LLVMBuildBitCast(builder, half, int_vec_type, "");
> -         half = LLVMBuildOr(builder, sign, half, "");
> -         half = LLVMBuildBitCast(builder, half, vec_type, "");
> -      }
> +      assert(type.sign);
You can't assert here. It is quite legal to have floats without sign
type - used to skip things like here when we know the input is positive.
There might not be all that many callers using such build contexts,
though some quick glance identified at least two
(lp_build_coord_repeat_npot_linear_int,
lp_build_clamped_float_to_unsigned_norm). Though I would have thought
the latter is where the bug you're seeing is coming from...
Also, I'm not a fan of making this more complex in general.
lp_build_itrunc is most often used for converting texture coords, or
other stuff like srgb conversion, and none of these callers care about
this, so maybe should distinguish the cases.

> +
> +      /* get sign bit */
> +      sign = LLVMBuildBitCast(builder, a, int_vec_type, "");
> +      sign = LLVMBuildAnd(builder, sign, mask, "");
> +
> +      /* get "ulp is fractional" */
> +      absa = lp_build_abs(bld, a);
> +      absa = LLVMBuildBitCast(builder, absa, int_vec_type, "");
> +      fractulp = lp_build_compare(bld->gallivm, int_type, PIPE_FUNC_LESS, absa, abslimit);
> +
> +      /* (sign * 0.5) & intulp */
> +      half = LLVMBuildBitCast(builder, half, int_vec_type, "");
> +      half = LLVMBuildOr(builder, sign, half, "");
> +      half = LLVMBuildAnd(builder, half, fractulp, "");
> +      half = LLVMBuildBitCast(builder, half, vec_type, "");
>  
>        res = LLVMBuildFAdd(builder, a, half, "");
>     }
> 

Though if the caller is lp_build_round, it actually seems not too
complicated - the abs() from both will be the same, and the comparison
could be made (nearly) the same too (lp_build_round and friends are
using 2^24, though my comments are wrong there and it could just use
2^23 instead).
Or would something like adding "largest float smaller than 0.5" first,
then add the remainder to 0.5 later work too? I guess though if you have
type.sign that's not just another "fadd" but also another "or".

Oh and this is not "util" code as the commit message mentions.

Roland