[Mesa-dev] [PATCH 00/59] Initial arb_gpu_shader_fp64 support to the i965 scalar backend

Fri Apr 29 11:28:57 UTC 2016

Hello,

This patch series continues adding arb_gpu_shader_fp64 support to the
Intel driver. Specifically, this targets the  i965 scalar backend for
BDW+ hardware (vec4 is still under research and gen7 has its own
issues which we intend tackle after gen8).

This adds most of the fp64 scalar implementation, it starts by enabling
the various lowering passes in NIR for doubles and then adds all the
infrastructure required in the backend to operate with 64-bit floating
point data.

For reference, this series fixes 1009 fp64 piglit tests in BDW. Fp64
totals look like this:

     pass:                  2523
     fail:                    46
     crash:                  447
     skip:                    16
     total:                 3032

There are a few missing things in this series to achieve a perfect fp64
pass rate:

1. Fixes to copy propagation. The fp64 code creates new code patterns
   that copy-propagation isn't really ready to handle yet leading to
   incorrect results in some cases. We have 9 patches to fix copy
   propagation for fp64 that we intend to send separately after the
   main fp64 infrastructure has been reviewed.

2. ubo/ssbo/shared-variables. We will also send the patches for this in
   a separate series after this one.

3. A fix for the SIMD lowering pass to properly handle execmasking when
   transposing the results of split instructions back together. We have
   a local fix for this, but Curro hit the same problem while working
   on SIMD32 and has a better solution for it so we intend to use his
   solution when it is ready.

4. Spilling. We don't support spilling of DF registers yet and some
   piglit tests need this to compile. Jason had plans to work on the
   spilling code and address the needs of fp64 along the way.

The series does not introduce any regressions in piglit on ILK, SNB,
HSW, BDW and SKL.

A branch with this series is available for testing here:

$ git clone -b i965-fp64-scalar-backend-part-1 https://github.com/Igalia/mesa.git

You will have to enable the extension with:

$ export MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader_fp64

The full scalar fp64 implementation, containing also the fixes to
copy-propagation as well as ubo/ssbo and our local fix for the SIMD
lowering pass is available here:

git clone -b i965-fp64 https://github.com/Igalia/mesa.git

And for the adventurous, there is also a work-in-progress branch that
adds scalar support for HSW here:

git clone -b i965-fp64-gen7 https://github.com/Igalia/mesa.git

Thanks,

Sam

Connor Abbott (33):
  i965: use double lowering pass
  i965: use pack/unpackDouble lowering
  i965/disasm: fix disasm of 3-src doubles
  i965/eu: allow doubles in math instructions
  i965: add brw_imm_df
  i965: add support for getting/setting DF immediates
  i965: add support for disassembling DF immediates
  i965/eu: add support for DF immediates
  i965: fix brw_negate_immediate() for doubles
  i965: fix is_zero(), is_one() and is_negative_one() for doubles
  i965: fixup uniform setup for doubles
  i965/fs: print writemask_all when it's enabled
  i965/fs: use the NIR bit size when creating registers
  i965/fs: don't propagate 64-bit immediates
  i965/fs: add support for printing double immediates
  i965/fs: always pass the bitsize to brw_type_for_nir_type()
  i965/fs: add a stride helper
  i965/fs: add PACK opcode
  i965/fs: add a pass for lowering PACK opcodes
  i965/fs/nir: translate double pack/unpack
  i965/fs: fix type_size() for doubles
  i965/fs: handle uniforms in byte_offset()
  i965/fs: use byte_offset() in offset() for uniforms
  i965/fs: fix assign_constant_locations() for doubles
  i965/fs: generalize SIMD16 interference workaround
  i965/fs: extend exec_size halving in the generator
  i965/fs: fix compares for doubles
  i965/fs: fix regs_read() for uniforms
  i965/fs: fix is_copy_payload() for doubles
  i965/fs: fix regs_written in LOAD_PAYLOAD for doubles
  i965/fs: fix dst width calculation in CSE
  i965/fs: add a pass for legalizing d2f
  i965/fs: add support for f2d and d2f

Iago Toral Quiroga (15):
  i965: fix brw_saturate_immediate() for doubles
  i965: fix brw_abs_immediate() for doubles
  i965: two-argument instructions can only use 32-bit immediates
  i965/fs: optimize pack double
  i965/fs: optimize unpack double
  i965/fs: handle fp64 opcodes in brw_do_channel_expressions
  i965/fs: We only support 32-bit integer ALU operations for now
  i965/fs: add null_reg_df
  i965/fs: implement fsign() for doubles
  i965/fs: implement d2b
  i965/fs: implement d2i and d2u
  i965/fs: implement i2d and u2d
  i965/fs: rename our lower_d2f pass to lower_d2x
  i965/fs/lower_simd_width: Fix registers written for split instructions
  i965/fs: recognize writes with a subreg_offset > 0 as partial

Samuel Iglesias Gonsálvez (7):
  i965: enable lrp lowering for doubles
  vc4: lower lrp when operating with double operands
  freedreno/ir3: lower lrp when operating with double operands
  i965/fs: align access to double-based uniforms in push constant buffer
  i965/fs: demote_pull_constants() did not take into account double
    types
  i965/fs: take into account doubles when calculating read_size for
    MOV_INDIRECT
  i965/fs: fix MOV_INDIRECT exec_size for doubles

Topi Pohjolainen (4):
  i965: Lower DFRACEXP/DLDEXP
  i965: Determine size of double precision float register
  i965: Tell backend register about double precision type
  i965/eu: Allow 3-src float ops with doubles

 src/gallium/drivers/freedreno/ir3/ir3_nir.c        |   1 +
 src/gallium/drivers/vc4/vc4_program.c              |   1 +
 src/mesa/drivers/dri/i965/Makefile.sources         |   2 +
 src/mesa/drivers/dri/i965/brw_compiler.c           |   2 +
 src/mesa/drivers/dri/i965/brw_compiler.h           |   8 +
 src/mesa/drivers/dri/i965/brw_defines.h            |   9 +
 src/mesa/drivers/dri/i965/brw_disasm.c             |   3 +-
 src/mesa/drivers/dri/i965/brw_eu_emit.c            |  60 +++--
 src/mesa/drivers/dri/i965/brw_fs.cpp               | 106 ++++++--
 src/mesa/drivers/dri/i965/brw_fs.h                 |   6 +-
 src/mesa/drivers/dri/i965/brw_fs_builder.h         |  15 +-
 .../dri/i965/brw_fs_channel_expressions.cpp        |  23 +-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |   3 +
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp           |   3 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp     |  16 +-
 src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp     |  75 ++++++
 src/mesa/drivers/dri/i965/brw_fs_lower_pack.cpp    |  59 +++++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp           | 287 ++++++++++++++++++---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |  67 +++--
 src/mesa/drivers/dri/i965/brw_inst.h               |  25 ++
 src/mesa/drivers/dri/i965/brw_ir_fs.h              |  14 +-
 src/mesa/drivers/dri/i965/brw_link.cpp             |   1 +
 src/mesa/drivers/dri/i965/brw_nir.c                |  10 +
 src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp     |   7 +-
 src/mesa/drivers/dri/i965/brw_program.c            |   1 +
 src/mesa/drivers/dri/i965/brw_reg.h                |  10 +
 src/mesa/drivers/dri/i965/brw_shader.cpp           |  73 ++++--
 src/mesa/drivers/dri/i965/brw_shader.h             |   1 +
 src/mesa/drivers/dri/i965/brw_wm.c                 |   2 +
 src/mesa/drivers/dri/i965/gen6_constant_state.c    |  12 +-
 30 files changed, 773 insertions(+), 129 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
 create mode 100644 src/mesa/drivers/dri/i965/brw_fs_lower_pack.cpp

-- 
2.5.0