[Mesa-dev] [RFC v2 0/3] More precise rcp and rsq for fp64 on gk110

Boyan Ding boyan.j.ding at gmail.com
Thu Mar 9 05:55:16 UTC 2017


This is the second version of fp64 precision series, including fixes as
per Ilia's advice.

The first patch should be functionally equivalent to the previous
version. Changes mostly focuses on code cleanup and rewording comments.
The second patch fixes a case where the original patch would generate
inaccurate rsq for some small normal inputs. The third one stays
untouched.

I ran through more tests on these two algorithms, comparing their
result with CPU implementation. I have never seen more than 1ulp
difference in rcp. While in rsq, there were some cases (~500ppm) with
2ulp difference. However, analysis with mpfr shows that all of those
were 1ulp error on both sides. So the precision now should satisfy
the requirement.

The assembly uses an instruction format yet to be merged to upstream
envytools assembler. I'll get that merged soon.

Boyan Ding (3):
  gk110/ir: Add rcp f64 implementation
  gk110/ir: Add rsq f64 implementation
  gk110/ir: Use the new rcp/rsq in library

 src/gallium/drivers/nouveau/codegen/lib/gk110.asm  | 219 ++++++++++++++++++++-
 .../drivers/nouveau/codegen/lib/gk110.asm.h        | 127 +++++++++++-
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      |  32 +++
 .../nouveau/codegen/nv50_ir_lowering_nvc0.h        |   1 +
 4 files changed, 375 insertions(+), 4 deletions(-)

-- 
2.12.0



More information about the mesa-dev mailing list