[Mesa-dev] [PATCH] gallivm: use llvm.nearbyint instead of llvm.round

Wed Apr 13 13:41:22 UTC 2016

Am 13.04.2016 um 12:12 schrieb Jose Fonseca:
> On 13/04/16 06:51, Jose Fonseca wrote:
>> On 13/04/16 04:00, sroland at vmware.com wrote:
>>> From: Roland Scheidegger <sroland at vmware.com>
>>>
>>> We used to use sse roundps intrinsic directly, but switched to use the
>>> llvm
>>> intrinsics for rounding with e4f01da15d8c6ce3e8c77ff3ff3d2ce2574a3f7b.
>>> However, llvm semantics follows standard math lib round function
>>> which is
>>> specced to do roundNearestAwayFromZero but we really want
>>> roundNearestEven
>>> (moreoever, using round generates atrocious code since the cpu can't
>>> do it
>>> directly and it results in scalar calls to libm __roundf).
>>> So, use llvm.nearbyint instead, which does exactly the right thing,
>>> and even
>>> has the advantage of being available with llvm 3.3 too. (I've
>>> verified it
>>> actually generates a roundps instruction with llvm 3.3.)
>>>
>>> This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909
>>
>> Thanks Roland.
>>
>> I should've cought that.  I think I forget to actually look at
>> llvm.round's assembly as I only found that intrisic much later.  I
>> presumed it was roundps.
>>
>>> ---
>>>   src/gallium/auxiliary/gallivm/lp_bld_arit.c | 6 +-----
>>>   1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>> index 0c43617..74f1683 100644
>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>>> @@ -1863,11 +1863,7 @@ lp_build_round_arch(struct lp_build_context *bld,
>>>
>>>         switch (mode) {
>>>         case LP_BUILD_ROUND_NEAREST:
>>> -         if (HAVE_LLVM >= 0x0304) {
>>> -            intrinsic_root = "llvm.round";
>>> -         } else {
>>> -            return lp_build_nearest_sse41(bld, a);
>>
>> You can remove lp_build_nearest_sse41 too while you're at it.
>>
>>> -         }
>>> +         intrinsic_root = "llvm.nearbyint";
>>>            break;
>>>         case LP_BUILD_ROUND_FLOOR:
>>>            intrinsic_root = "llvm.floor";
>>>
>>
>> We should add a test case to lp_test_arit too.  Like this:
>>
>> diff --git a/src/gallium/drivers/llvmpipe/lp_test_arit.c
>> b/src/gallium/drivers/llvmpipe/lp_test_arit.c
>> index a0f2db7..ba831f3 100644
>> --- a/src/gallium/drivers/llvmpipe/lp_test_arit.c
>> +++ b/src/gallium/drivers/llvmpipe/lp_test_arit.c
>> @@ -218,6 +218,7 @@ const float round_values[] = {
>>         -10.0, -1, 0.0, 12.0,
>>         -1.49, -0.25, 1.25, 2.51,
>>         -0.99, -0.01, 0.01, 0.99,
>> +      -1.5, -0.5, 0.5, 1.5,
>>         1.401298464324817e-45f, // smallest denormal
>>         -1.401298464324817e-45f,
>>         1.62981451e-08f,
>> @@ -283,7 +284,7 @@ unary_tests[] = {
>>      {"sin", &lp_build_sin, &sinf, sincos_values,
>> Elements(sincos_values), 20.0 },
>>      {"cos", &lp_build_cos, &cosf, sincos_values,
>> Elements(sincos_values), 20.0 },
>>      {"sgn", &lp_build_sgn, &sgnf, exp2_values, Elements(exp2_values),
>> 20.0 },
>> -   {"round", &lp_build_round, &roundf, round_values,
>> Elements(round_values), 24.0 },
>> +   {"round", &lp_build_round, &nearbyintf, round_values,
>> Elements(round_values), 24.0 },
>>      {"trunc", &lp_build_trunc, &truncf, round_values,
>> Elements(round_values), 24.0 },
>>      {"floor", &lp_build_floor, &floorf, round_values,
>> Elements(round_values), 24.0 },
>>      {"ceil", &lp_build_ceil, &ceilf, round_values,
>> Elements(round_values), 24.0 },
>>
>>
>> It seems nearbyintf is available on MSVC too.
>>
>> It's just not clear whether MSVC implementation follows round-to-even:
>> the nearbyint doc page is missing from MSDN, and rintf page explicitly
>> states that rint(2.5) is 3 per,
>> https://msdn.microsoft.com/en-us/library/dn465165.aspx
> 
> I've verified this works --
> https://ci.appveyor.com/project/jrfonseca/mesa/build/20 --, so I went
> ahead and commited your change, plus the lp_test_arit one.
> 

Ok. That msdn page is funny ;-). It is true that things are (as stated
there as well) according to current rounding mode, but that's pretty
much always round-to-nearest-even (current rounding mode is of course
switchable, but I think most architectures can't do anything else than
nearest-even for their nearest mode).

(It is actually not really obvious that tgsi "round" needs to be
round-to-nearest-even - glsl has both round and roundEven instructions,
for round other tie-breakers are allowed too. But we just translate it
all the same since there doesn't seem to be any need for another
roundNearest instruction since everybody can do round-to-nearest-even.
d3d10 OTOH only has round_ne.)

Roland