[Mesa-dev] [PATCH] gallivm: Use nextafterf(0.5, 0.0) as rounding constant

Wed Nov 28 17:20:18 UTC 2018

Am 28.11.18 um 07:37 schrieb Matt Turner:
> The common truncf(x + 0.5) fails for the floating-point value just less
> than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after
> rounding is 1.0, thus truncf does not produce the desired value.
> 
> The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before
> truncating. This works for all values.

Reviewed-by: Roland Scheidegger <sroland at vmware.com>

Although this will still do round-to-nearest-away-from-zero, instead of
nearest-even which we probably want.
That said, I don't think it matters anywhere - d3d10 round will require
round-to-nearest-even, and the shader round might fall back to this, but
I don't think we really care in this case. GL in general doesn't care
anyway of course. (I don't think you can easily implement
round-to-nearest-even with a fallback.)

> ---
> I noticed this while investigating https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.gentoo.org%2F665570&data=02%7C01%7Csroland%40vmware.com%7C63d1da48159f4cbe95e308d654fbf3d8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636789838438856537&sdata=yzXELCbFCR0IsjzrObWN%2B%2BdzVAOV0HQFkSn%2FxxWesWI%3D&reserved=0 but it
> does not fix it.
> 
> Roland, do you have a suggestion for how to make lp_build_iround() work
> on non-SSE/non-Altivec platforms? I notice that if I unconditionally
> return TRUE from arch_rounding_available() and make
> lp_build_round_arch() take the SSE4.1 path (that emits llvm.nearbyint)
> it passes on ARM64.
> 
> I noticed there's some hack in lp_test_arit.c:test_unary:
> 
>    if (test->ref == &nearbyintf && length == 2 &&
>        ref != roundf(testval)) {
>       /* FIXME: The generic (non SSE) path in lp_build_iround, which is
>        * always taken for length==2 regardless of native round support,
>        * does not round to even. */
>       expected_pass = FALSE;
>    }
> 
> It'd be nice to get rid of that.. but maybe we can somehow use it to
> just mark all the round tests as expected fail on other platforms if no
> real fix is forthcoming?

Actually I think arch_rounding_available() would not need to check for
vector size (but we should check for type width instead) for the sse41
case? I think llvm should be able to handle any vector size reasonably,
type legalization should take care of it, which would eliminate the
length-2 problem for sse41 (at some point we actually used sse
intrinsics, not the llvm ones.)

As a side note, perhaps we could use the llvm intrinsics on altivec too
instead nowadays, I don't know if they do the right thing there, but I
can't see why they wouldn't, but someone would need to test it.

The only problem with using the llvm intrinsics is you really really
really don't want to do it if the cpu doesn't natively support it. In
this case llvm (at least used to) emit (scalar of course) calls to math
library, which is completely horrendous (in particular since we don't
really care that much about the exact rounding), if it even works. Hence
despite using llvm intrinsics we need to know if the cpu actually
supports it.
So ideally for arm we'd actually detect cpu features like for other
archs (I don't think all arm chips can do rounding ops, even if they
have neon, although perhaps all arm64 ones can) and follow similar
logic. But noone ever submitted a arm-specific patch for better llvmpipe
support...

Roland

> 
>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> index f348833206b..c050bfdb936 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> @@ -2477,7 +2477,7 @@ lp_build_iround(struct lp_build_context *bld,
>     else {
>        LLVMValueRef half;
>  
> -      half = lp_build_const_vec(bld->gallivm, type, 0.5);
> +      half = lp_build_const_vec(bld->gallivm, type, nextafterf(0.5, 0.0));
>  
>        if (type.sign) {
>           LLVMTypeRef vec_type = bld->vec_type;
>