[Mesa-dev] IROUND, math errors, etc.

Fri Jul 31 10:06:12 PDT 2015

Am 31.07.2015 um 18:44 schrieb Matt Turner:
> On Fri, Jul 31, 2015 at 7:13 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>> CC mesa-dev.
>>
>> This looks good to me. I am starting to wonder though why we don't just
>> use lrintf() and let the compiler sort it out (for x86 too).
>> Though actually some quick experiments show that:
>> - llvm's clang will always use libm lrintf call. Which then will do
>> (x86_64) cvtss2si %xmm0,%rax as expected. Meaning the cost is probably
>> twice as high as it could be due to the unnecessary library call.
>> - gcc will also use the same library call. Unless you specify
>> -fno-math-errno (or some more aggressive math optimizing stuff), in
>> which case it will do the cvtss2si on its own. Which is fairly stupid,
>> because this function doesn't set errno in any case, so it could be used
>> independent of -fno-math-errno.
>>
>> Speaking of -fno-math-errno, why don't we use that in mesa? I know the
>> fast math stuff can be problematic, but noone is _ever_ interested in
>> math error numbers.
>>
>> Speaking of which, I'm not really sure why IROUND isn't doing the same.
>> Yes it rounds away from zero, but I doubt that matters - would probably
>> be better to match whatever rounding is used in hw (GL doesn't seem to
>> specify tie-breaker rules for round to nearest afaict).
>>
>> FWIW IROUND along with even the 64bit sibling IROUND64 (and IROUND_POS)
>> is not even really correct in any case. There exist floats where f +
>> 0.5f will round up to the next integer incorrectly. e.g. something like
>> "largest float smaller than 63.5f", 63.4999999f or so, if you add +0.5f
>> the resulting number for the hw is right between that largest float
>> smaller than 63.5f and 64.0f, and thus it will use the tie-breaker rule
>> (round to nearest even for your typical hw with typical rounding mode
>> set) making this 64.0, thus the rounded integer will be 64, which is
>> just plain wrong no matter the round-to-nearest tie breaker rule.
>> There are ways to fix it (the obvious one is to add 0.5 as double), but
>> I don't think we should even try that, and assume lrintf can do a decent
>> job on hw we care about (compiler not doing its job right is a pity but
>> might not be too bad even if it uses lib call).
> 
> I've actually got a branch to get rid of F_TO_I (and I want to remove
> IROUND as well) in favor of libm rounding functions.
> 
> I agree that we don't care about errno and traps and such, so I tried
> a few things to get the code we want from rintf, etc. I tried marking
> a wrapper around rintf with __attribute__((optimize("-ffast-math")))
> but just today a gcc developer confirmed that this cannot work because
> when the function is inlined it loses the optimization attribute. I'll
> do some tests with -fno-math-errno and friends.
> 
> I'll finish this branch up very soon.
> 

Well for F_TO_I and IROUND we really want lrintf, not rintf (which is
much easier on the (x86) cpu - rintf requires sse4.1 for a trivial
implementation, lrintf just sse so at least on x86_64 you always get it
which is quite a benefit for "standard" compilation). I assume though
gcc only inlining it with -fno-math-errno is really a gcc bug - the
function doesn't set error number, and exceptions will happen just the
same no matter if you use the libm call or do it directly, both depend
on current exception settings. Though I don't know why clang didn't want
to inline it at all no matter what, seems kind of silly.

Thanks for looking into it.

Roland