[Mesa-dev] Problems with accuracy of coeffs_init + attribs_update

Thu Oct 29 12:33:07 PDT 2015

On Thu, Oct 29, 2015 at 9:25 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 29.10.2015 um 20:10 schrieb Oded Gabbay:
>> On Thu, Oct 29, 2015 at 9:02 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> On Thu, Oct 29, 2015 at 2:44 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>> However, I would hate to keep the situation as is, meaning the test
>>>> passes on x86-64 and fails on ppc64le.
>>>
>>> Sounds like it'd actually be a difference between AVX and SSE4.2 as
>>> well -- what happens if you run on your x86_64 chip with
>>> LP_NATIVE_VECTOR_WIDTH=128 ? It fails for me on my HSW chip, looking
>>> at the results visually it's mostly good but there's a sprinkling of
>>> red pixels.
>>>
>>>   -ilia
>>
>> It fails on my Haswell chip - definitely sprinkling of red pixels.
>> Also the error seems to be greater than 5e-7. Even with 1.6e-6 as
>> failure point, it still fails, while on ppc64le it passes.
>> Only when I went for 2e-6, the test passes.
>>
>> As I said and Roland explained, the calculation method is inherently
>> less accurate in the two-stages path. Although I don't know why on
>> SSE4.2 the deviation is a bit larger than on VMX
>>
>
> Does that have fma and does it auto-fuse mul/add to fma? Albeit I don't
> think it should right now... Other than that, I'm not sure why the
> results would be different - albeit on x86 we explicitly disable denorms
> which could cause different results, for this example I don't think this
> should be an issue.
>
> Roland
>
>

You asked about ppc64le ?
If so, It does have fma instructions in its ISA, both for floating
point and for vector. However, if I look in the LLVM IR, I don't see
any fma intrinsic and if I look at the disassemble, I don't see any
instructions of multiply-and-add.

So in theory it exists but it isn't used in the tests I mentioned.

     Oded