[Mesa-dev] [PATCH 00/59] Initial arb_gpu_shader_fp64 support to the i965 scalar backend

Mon May 2 22:24:30 UTC 2016

On 2016-05-01 22:47:40, Jordan Justen wrote:
> 7-10, 12-20, 36-43, 57-58:
> Reviewed-by: Jordan Justen <jordan.l.justen at intel.com>

34-35 Reviewed-by: Jordan Justen <jordan.l.justen at intel.com>

> 
> I also sent questions about 56 & 59.
> 
> On 2016-04-29 04:28:57, Samuel Iglesias Gonsálvez wrote:
> > Hello,
> > 
> > This patch series continues adding arb_gpu_shader_fp64 support to the
> > Intel driver. Specifically, this targets the  i965 scalar backend for
> > BDW+ hardware (vec4 is still under research and gen7 has its own
> > issues which we intend tackle after gen8).
> > 
> > This adds most of the fp64 scalar implementation, it starts by enabling
> > the various lowering passes in NIR for doubles and then adds all the
> > infrastructure required in the backend to operate with 64-bit floating
> > point data.
> > 
> > For reference, this series fixes 1009 fp64 piglit tests in BDW. Fp64
> > totals look like this:
> > 
> >      pass:                  2523
> >      fail:                    46
> >      crash:                  447
> >      skip:                    16
> >      total:                 3032
> > 
> > There are a few missing things in this series to achieve a perfect fp64
> > pass rate:
> > 
> > 1. Fixes to copy propagation. The fp64 code creates new code patterns
> >    that copy-propagation isn't really ready to handle yet leading to
> >    incorrect results in some cases. We have 9 patches to fix copy
> >    propagation for fp64 that we intend to send separately after the
> >    main fp64 infrastructure has been reviewed.
> > 
> > 2. ubo/ssbo/shared-variables. We will also send the patches for this in
> >    a separate series after this one.
> > 
> > 3. A fix for the SIMD lowering pass to properly handle execmasking when
> >    transposing the results of split instructions back together. We have
> >    a local fix for this, but Curro hit the same problem while working
> >    on SIMD32 and has a better solution for it so we intend to use his
> >    solution when it is ready.
> > 
> > 4. Spilling. We don't support spilling of DF registers yet and some
> >    piglit tests need this to compile. Jason had plans to work on the
> >    spilling code and address the needs of fp64 along the way.
> > 
> > The series does not introduce any regressions in piglit on ILK, SNB,
> > HSW, BDW and SKL.
> > 
> > A branch with this series is available for testing here:
> > 
> > $ git clone -b i965-fp64-scalar-backend-part-1 https://github.com/Igalia/mesa.git
> > 
> > You will have to enable the extension with:
> > 
> > $ export MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader_fp64
> > 
> > The full scalar fp64 implementation, containing also the fixes to
> > copy-propagation as well as ubo/ssbo and our local fix for the SIMD
> > lowering pass is available here:
> > 
> > git clone -b i965-fp64 https://github.com/Igalia/mesa.git
> > 
> > And for the adventurous, there is also a work-in-progress branch that
> > adds scalar support for HSW here:
> > 
> > git clone -b i965-fp64-gen7 https://github.com/Igalia/mesa.git
> > 
> > Thanks,
> > 
> > Sam
> > 
> > 
> > Connor Abbott (33):
> >   i965: use double lowering pass
> >   i965: use pack/unpackDouble lowering
> >   i965/disasm: fix disasm of 3-src doubles
> >   i965/eu: allow doubles in math instructions
> >   i965: add brw_imm_df
> >   i965: add support for getting/setting DF immediates
> >   i965: add support for disassembling DF immediates
> >   i965/eu: add support for DF immediates
> >   i965: fix brw_negate_immediate() for doubles
> >   i965: fix is_zero(), is_one() and is_negative_one() for doubles
> >   i965: fixup uniform setup for doubles
> >   i965/fs: print writemask_all when it's enabled
> >   i965/fs: use the NIR bit size when creating registers
> >   i965/fs: don't propagate 64-bit immediates
> >   i965/fs: add support for printing double immediates
> >   i965/fs: always pass the bitsize to brw_type_for_nir_type()
> >   i965/fs: add a stride helper
> >   i965/fs: add PACK opcode
> >   i965/fs: add a pass for lowering PACK opcodes
> >   i965/fs/nir: translate double pack/unpack
> >   i965/fs: fix type_size() for doubles
> >   i965/fs: handle uniforms in byte_offset()
> >   i965/fs: use byte_offset() in offset() for uniforms
> >   i965/fs: fix assign_constant_locations() for doubles
> >   i965/fs: generalize SIMD16 interference workaround
> >   i965/fs: extend exec_size halving in the generator
> >   i965/fs: fix compares for doubles
> >   i965/fs: fix regs_read() for uniforms
> >   i965/fs: fix is_copy_payload() for doubles
> >   i965/fs: fix regs_written in LOAD_PAYLOAD for doubles
> >   i965/fs: fix dst width calculation in CSE
> >   i965/fs: add a pass for legalizing d2f
> >   i965/fs: add support for f2d and d2f
> > 
> > Iago Toral Quiroga (15):
> >   i965: fix brw_saturate_immediate() for doubles
> >   i965: fix brw_abs_immediate() for doubles
> >   i965: two-argument instructions can only use 32-bit immediates
> >   i965/fs: optimize pack double
> >   i965/fs: optimize unpack double
> >   i965/fs: handle fp64 opcodes in brw_do_channel_expressions
> >   i965/fs: We only support 32-bit integer ALU operations for now
> >   i965/fs: add null_reg_df
> >   i965/fs: implement fsign() for doubles
> >   i965/fs: implement d2b
> >   i965/fs: implement d2i and d2u
> >   i965/fs: implement i2d and u2d
> >   i965/fs: rename our lower_d2f pass to lower_d2x
> >   i965/fs/lower_simd_width: Fix registers written for split instructions
> >   i965/fs: recognize writes with a subreg_offset > 0 as partial
> > 
> > Samuel Iglesias Gonsálvez (7):
> >   i965: enable lrp lowering for doubles
> >   vc4: lower lrp when operating with double operands
> >   freedreno/ir3: lower lrp when operating with double operands
> >   i965/fs: align access to double-based uniforms in push constant buffer
> >   i965/fs: demote_pull_constants() did not take into account double
> >     types
> >   i965/fs: take into account doubles when calculating read_size for
> >     MOV_INDIRECT
> >   i965/fs: fix MOV_INDIRECT exec_size for doubles
> > 
> > Topi Pohjolainen (4):
> >   i965: Lower DFRACEXP/DLDEXP
> >   i965: Determine size of double precision float register
> >   i965: Tell backend register about double precision type
> >   i965/eu: Allow 3-src float ops with doubles
> > 
> >  src/gallium/drivers/freedreno/ir3/ir3_nir.c        |   1 +
> >  src/gallium/drivers/vc4/vc4_program.c              |   1 +
> >  src/mesa/drivers/dri/i965/Makefile.sources         |   2 +
> >  src/mesa/drivers/dri/i965/brw_compiler.c           |   2 +
> >  src/mesa/drivers/dri/i965/brw_compiler.h           |   8 +
> >  src/mesa/drivers/dri/i965/brw_defines.h            |   9 +
> >  src/mesa/drivers/dri/i965/brw_disasm.c             |   3 +-
> >  src/mesa/drivers/dri/i965/brw_eu_emit.c            |  60 +++--
> >  src/mesa/drivers/dri/i965/brw_fs.cpp               | 106 ++++++--
> >  src/mesa/drivers/dri/i965/brw_fs.h                 |   6 +-
> >  src/mesa/drivers/dri/i965/brw_fs_builder.h         |  15 +-
> >  .../dri/i965/brw_fs_channel_expressions.cpp        |  23 +-
> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |   3 +
> >  src/mesa/drivers/dri/i965/brw_fs_cse.cpp           |   3 +-
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp     |  16 +-
> >  src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp     |  75 ++++++
> >  src/mesa/drivers/dri/i965/brw_fs_lower_pack.cpp    |  59 +++++
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp           | 287 ++++++++++++++++++---
> >  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |  67 +++--
> >  src/mesa/drivers/dri/i965/brw_inst.h               |  25 ++
> >  src/mesa/drivers/dri/i965/brw_ir_fs.h              |  14 +-
> >  src/mesa/drivers/dri/i965/brw_link.cpp             |   1 +
> >  src/mesa/drivers/dri/i965/brw_nir.c                |  10 +
> >  src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp     |   7 +-
> >  src/mesa/drivers/dri/i965/brw_program.c            |   1 +
> >  src/mesa/drivers/dri/i965/brw_reg.h                |  10 +
> >  src/mesa/drivers/dri/i965/brw_shader.cpp           |  73 ++++--
> >  src/mesa/drivers/dri/i965/brw_shader.h             |   1 +
> >  src/mesa/drivers/dri/i965/brw_wm.c                 |   2 +
> >  src/mesa/drivers/dri/i965/gen6_constant_state.c    |  12 +-
> >  30 files changed, 773 insertions(+), 129 deletions(-)
> >  create mode 100644 src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
> >  create mode 100644 src/mesa/drivers/dri/i965/brw_fs_lower_pack.cpp
> > 
> > -- 
> > 2.5.0
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev