[Mesa-dev] [WIP] i965/fs: Initial support for arb_gpu_shader_fp64

Thu Oct 16 05:24:12 PDT 2014

Here is some basis for supporting double precision floats on i965 hw.
On IVB this gives (details below):

piglit-run.py --include-tests "ARB_gpu_shader_fp64" tests/all.py /tmp/foo
[32/32] crash: 2, fail: 5, pass: 24, skip: 1                    

This sits on top of Dave Airlie's and Ilia Mirkin's frontend patches.

The only supported hardware for now is IVB although I have initial work
to enable this on HSW and newer - it pretty much comes down to just how
register regions are specified. I will submit that as follow-up. On IVB
the regions are as if there are two packed floats while on newer they are
natural and correspond to 64-bits.

I've chosen to teach FS-LIR level (fs_visitor) to deal with 64-bit
variables and augment hardware instruction emission (fs_generator and
brw_eu) to output two instructions whenever double precision floats
are involved. The hardware can deal with only half the number of channels
when the channels are 64-bits wide - hence two instructions are needed.

There are number of things that can be optimized - I've been mostly
interested of creating safe basis for now.

fp64/execution/built-in-functions/const-fma-double: pass
fp64/execution/built-in-functions/fs-const-packDouble2x32: pass
fp64/execution/built-in-functions/fs-unpackDouble2x32-2: pass
fp64/preprocessor/define.vert: pass
fp64/execution/built-in-functions/fs-const-unpackDouble2x32: pass
fp64/compiler/implicit-conversions.vert: pass
fp64/execution/built-in-functions/glsl-double-const-expr-vector-insert: pass
fp64/execution/built-in-functions/fs-ldexp-dvec4: fail

This actually works somewhat taking advantage of Ilia's lowering passes
in glsl-ir. There are 8 passes in the test and the first six are fine.
The last two with signs on doubles cause problems - I'm going to look
into this next.

fp64/execution/built-in-functions/fs-packDouble2x32: pass
fp64/preprocessor/fs-output-double.frag: pass
fp64/execution/built-in-functions/fs-frexp-dvec4: fail
fp64/compiler/double-loop-iterator.vert: pass
fp64/execution/built-in-functions/glsl-double-conversion-constructor-02: pass
fp64/execution/built-in-functions/glsl-double-conversion-constructor-01: pass
fp64/execution/glsl-uniform-initializer-7: pass
fp64/execution/glsl-uniform-initializer-6: pass
fp64/execution/glsl-uniform-initializer-5: pass
fp64/execution/glsl-uniform-initializer-4: fail

Here we have a multiplication of a matrix with a vector. Something
in my todo list.

fp64/execution/glsl-uniform-initializer-3: pass
fp64/execution/glsl-uniform-initializer-2: pass
fp64/execution/glsl-uniform-initializer-1: pass
fp64/execution/glsl-uniform-initializer-8: pass
fp64/execution/built-in-functions/fs-fma-double: crash
fp64/execution/gs-fs-vs-double: crash

These two are simply lacking support for now.

fp64/preprocessor/define.frag: pass
fp64/execution/built-in-functions/fs-const-ldexp-double: pass
fp64/execution/built-in-functions/glsl-double-const-expr-vector-extract: pass
fp64/execution/vs-out-fs-in-double: fail
fp64/preprocessor/vs-input-double.vert: pass
fp64/execution/built-in-functions/fs-trunc-double-large: fail
fp64/execution/built-in-functions/fs-modf-double: skip
fp64/execution/built-in-functions/fs-unpackDouble2x32: pass

Topi Pohjolainen (25):
  i965: Lower DFRACEXP/DLDEXP
  mesa: Teach uniform update to take into account double precision
  i965: Determine size of double precision float register
  i965: Tell backend register about double precision type
  i965/fs: Prepare virtual registers for double precision floats
  i965/fs: Take double float into account in register offsets
  i965/fs: Prepare live interval analysis for double precision
  i965/fs: Add support for double precision uniform loading
  i965/fs: Generator support for converting double to float
  i965/fs: Double precision to single conversion support
  i965/fs: Prepare register allocator for double precision floats
  i965: Add helper telling if a register is scalar
  i965/fs: Add pack_double_2x32 virtual opcode
  i965/fs: Add support for ir_unop_pack_double_2x32
  i965/gen8: Add support for double precision constant operands
  i965/gen7: Add support for double precision constant operands
  i965/fs: Make generator to emit two instructions for double floats
  i965/fs: Generate two instructions for double precision comparison
  i965/fs: Collect results for double precision conditionals
  i965/gen7: Add support for loading double float scalars in 16-width
  i965/fs/gen7: Add generator support for loading double precision
    uniforms
  i965: Add helper telling if uniform is double and requires special
    load
  i965/fs: Lower double precision scalars into vectors
  i965/fs: Add unpack_double_2x32 virtual opcode
  i965/fs: Add support for ir_unop_unpack_double_2x32

 src/mesa/drivers/dri/i965/brw_defines.h            |   6 +
 src/mesa/drivers/dri/i965/brw_eu.h                 |   4 +
 src/mesa/drivers/dri/i965/brw_eu_emit.c            |  69 ++++++
 src/mesa/drivers/dri/i965/brw_fs.cpp               |  79 ++++++-
 src/mesa/drivers/dri/i965/brw_fs.h                 |  24 ++-
 .../dri/i965/brw_fs_channel_expressions.cpp        |   7 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp     | 235 ++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |  29 +++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp       | 102 ++++++++-
 src/mesa/drivers/dri/i965/brw_reg.h                |  19 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp           |  42 +++-
 src/mesa/drivers/dri/i965/brw_shader.h             |   2 +
 src/mesa/main/uniform_query.cpp                    |   2 +-
 13 files changed, 595 insertions(+), 25 deletions(-)

-- 
1.8.3.1