[Mesa-dev] [PATCH v2 3/5] r600: implement DDIV

Thu Jan 19 18:42:45 UTC 2017

Am 19.01.2017 um 18:45 schrieb Ilia Mirkin:
> On Thu, Jan 19, 2017 at 12:40 PM, Nicolai Hähnle
> <nicolai.haehnle at amd.com> wrote:
>> On 19.01.2017 17:50, Ilia Mirkin wrote:
>>>
>>> NVIDIA can't do division either (double or otherwise - but double is
>>> even worse since the rcp is only half there). It has some fancy
>>> functions which get things like division, sqrt, etc.
>>
>>
>> What does that mean, precisely? It might make sense to have a cap-bit after
>> all, and then if somebody can be bothered to do it, have the front-end
>> generate TGSI for the kind of Newton-Raphson code that LLVM implements for
>> GCN.
> 
> Well, there are 2 64-bit "special" (i.e. beyond
> add/mul/fma/conversion) operations available - "rcp64h" and "rsq64h".
> These operations consume a double, and produce the high 32 bits of the
> result. Everything else has to be done by hand. For rcp/rsq, that
> means newton-raphson steps, for sqrt and div, that can mean a lot
> more, depending on how accurate you want it.
> 
> IMHO this is something that should be left up to the drivers.
> 

One problem of Newton-Raphson steps for these things is that it tends to
screwup Inf results (which might become NaN instead), so doing it
generically might not be worth it if hw can do better.
(Basically the reason why llvmpipe doesn't use the sse rcp instruction
anymore (for floats) and uses div instead: it's very fast but accuracy
is really low - so you have to add a Newton-Raphson step. But then you
also have to add fixups for special results, and at this point you can
just as well use the slow division... IIRC though AVX512 actually has
some fixup instructions which can easily replace certain values with
Infs, Nans)

Roland