[Mesa-dev] [RFC 0/3] More precise rcp and rsq for fp64 on gk110
Boyan Ding
boyan.j.ding at gmail.com
Sun Mar 5 15:34:55 UTC 2017
Nvidia's ISA only provides rcp and rsq of upper 32 bit of a double-
precision number, and extra steps should be taken to achieve the
required precision. This series implements more precise algorithms
using newton-raphson steps. Edge cases such as nan and denorms are
fully taken into account. More details are covered in comments on the
assembly code.
I tested my implementation with some manually-picked values which
covered every cases, and many randomly generated numbers. I didn't
see difference more than 2ulp with CPU implementation with more than
650 million random value test on each of the two algorithms.
The implementation is only available on gk110 for two reasons: it is
the only platform on which I can test, and, I think it easier to
maintain on one platform when a lot of change might still take place.
Ideally, it should be ported to all platforms after it is thought to be
stable enough.
Boyan Ding (3):
gk110/ir: Add rcp f64 implementation
gk110/ir: Add rcp f64 implementation
gk110/ir: Use the new rcp/rsq f64 in library
src/gallium/drivers/nouveau/codegen/lib/gk110.asm | 219 ++++++++++++++++++++-
.../drivers/nouveau/codegen/lib/gk110.asm.h | 134 ++++++++++++-
.../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 32 +++
.../nouveau/codegen/nv50_ir_lowering_nvc0.h | 1 +
4 files changed, 382 insertions(+), 4 deletions(-)
--
2.12.0
More information about the mesa-dev
mailing list