[Mesa-dev] Problems with accuracy of coeffs_init + attribs_update

Thu Oct 29 12:37:18 PDT 2015

Am 29.10.2015 um 20:33 schrieb Oded Gabbay:
> On Thu, Oct 29, 2015 at 9:25 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 29.10.2015 um 20:10 schrieb Oded Gabbay:
>>> On Thu, Oct 29, 2015 at 9:02 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>> On Thu, Oct 29, 2015 at 2:44 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>>> However, I would hate to keep the situation as is, meaning the test
>>>>> passes on x86-64 and fails on ppc64le.
>>>>
>>>> Sounds like it'd actually be a difference between AVX and SSE4.2 as
>>>> well -- what happens if you run on your x86_64 chip with
>>>> LP_NATIVE_VECTOR_WIDTH=128 ? It fails for me on my HSW chip, looking
>>>> at the results visually it's mostly good but there's a sprinkling of
>>>> red pixels.
>>>>
>>>>   -ilia
>>>
>>> It fails on my Haswell chip - definitely sprinkling of red pixels.
>>> Also the error seems to be greater than 5e-7. Even with 1.6e-6 as
>>> failure point, it still fails, while on ppc64le it passes.
>>> Only when I went for 2e-6, the test passes.
>>>
>>> As I said and Roland explained, the calculation method is inherently
>>> less accurate in the two-stages path. Although I don't know why on
>>> SSE4.2 the deviation is a bit larger than on VMX
>>>
>>
>> Does that have fma and does it auto-fuse mul/add to fma? Albeit I don't
>> think it should right now... Other than that, I'm not sure why the
>> results would be different - albeit on x86 we explicitly disable denorms
>> which could cause different results, for this example I don't think this
>> should be an issue.
>>
>> Roland
>>
>>
> 
> You asked about ppc64le ?
> If so, It does have fma instructions in its ISA, both for floating
> point and for vector. However, if I look in the LLVM IR, I don't see
> any fma intrinsic and if I look at the disassemble, I don't see any
> instructions of multiply-and-add.
> 
> So in theory it exists but it isn't used in the tests I mentioned.
Ok that's as expected then. It just isn't always very obvious if
auto-fusing is allowed or not (unsafe math of course always allows
this). Maybe there's some more differences in the fpu environments I
can't remember...

Roland