[Mesa-dev] [PATCH v2 00/30] Finishing arb_gpu_shader_fp64 support to the i965 scalar backend

Fri May 13 09:56:28 UTC 2016

On Thu, 2016-05-12 at 13:35 +0200, Samuel Iglesias Gonsálvez wrote:
> Hi,
> 
> this version includes all the feedback received to v1 plus a few new
> patches (22-27) that deal with 64bit URB read/writes, which was
> missing in v1. Below is a list of patches that still need to get the Rb:
> 
> [PATCH v2 02/30] i965/fs: Fix propagation of copies with strided source.
> [PATCH v2 05/30] i965/fs: Simplify and fix register offset calculation
> [PATCH v2 06/30] i965/fs: Reindent register offset calculation of
> [PATCH v2 07/30] i965/fs: fix copy propagation of partially invalidated
> [PATCH v2 11/30] i965/fs: add shuffle_32bit_load_result_to_64bit_data
> [PATCH v2 14/30] i965/fs: fix pull constant load component selection for
> [PATCH v2 18/30] i965/fs: support doubles with SSBO loads
> [PATCH v2 19/30] i965/fs: add shuffle_64bit_data_for_32bit_write helper
> [PATCH v2 20/30] i965/fs: support doubles with ssbo stores
> [PATCH v2 21/30] i965/fs: support doubles with shared variable stores
> [PATCH v2 22/30] i965/vec4: handle doubles in type_size_vec4()
> [PATCH v2 23/30] i965/fs: fix number of output components for doubles
> [PATCH v2 24/30] i965/fs: fix nir_intrinsic_store_output for doubles
> [PATCH v2 25/30] i965/tcs/scalar: fix load input for doubles
> [PATCH v2 26/30] i965/tcs/scalar: fix store output for doubles
> [PATCH v2 27/30] i965/tes/scalar: Fix load input for doubles

I've just sent a v3 for patches 19 and 21. The former gets rid of the
temporary like Curro suggested since in this case we really don't want
to do the shuffling in-place. The latter fixes a related bug where we
were doing in-place shuffling before a write which we shouldn't.

I think we have addressed all the other comments too, including moving
the shuffling functions to brw_fs_nir.cpp. I also went ahead and made
the do_untyped_vector_read helper static to brw_fs_nir.cpp (instead of a
fs_visitor method) since Curro's reasoning for the shuffling functions
applies to this helper just as much.

All these changes have been merged in our
i965-fp64-scalar-backend-part2-to-push branch for review / testing.

I think that at this point we only need the thumbs-up for those two v3
patches and see if Curro has more feedback since I believe he did not
have time to go through all the patches yet. If Curro does not find
anything major we should be able to land this tomorrow.

> There is still some discussion on going about where to put the
> shuffling functions but it does not make sense to postpone review of v2
> because of that, so for now we kept them in brw_fs.cpp and if we
> finally agree to move them to brw_fs_nir.cpp we will do that before
> pushing.
> 
> We have not observed any piglit regressions in ILK, SNB, IVB, HSW, BDW
> or SKL compared against master's ba3f0b6.
> 
> This series enables fp64 for gen8+ only and requires scalar GS, TCS and
> TES so these gens can do fp64 in these stages via the scalar backend,
> as the vec4 backend is not ready yet. Support to enable the scalar
> backend by default for all 3 stages has already landed in master so we
> should be all set in this regard.
> 
> As usual, a branch with the series is available for testing here:
> $ git clone -b i965-fp64-scalar-backend-part2-to-push  https://github.com/Igalia/mesa.git
> 
> All the new fp64 tests we wrote have also landed in piglit, except for
> patch [0]. We have a branch available with that test included here:
> 
> $ git clone -b arb_gpu_shader_fp64 https://github.com/Igalia/piglit.git
> 
> Thanks,
> 
> Sam
> 
> [0] https://lists.freedesktop.org/archives/piglit/2016-May/019761.html
> 
> Francisco Jerez (5):
>   i965/fs: Fix propagation of copies with strided source.
>   i965/fs: Simplify and fix register offset calculation of
>     try_copy_propagate().
>   i965/fs: Reindent register offset calculation of try_copy_propagate().
>   i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.
>   i965/fs: Fix and document component().
> 
> Iago Toral Quiroga (25):
>   i965/fs: fix subreg_offset overflow in byte_offset()
>   i965/fs: Fix copy propagation of load payload for double operands
>   i965/fs: disallow type change in copy-propagation if types have
>     different sizes
>   i965/fs: fix copy propagation of partially invalidated entries
>   i965/fs: fix copy propagation from load payload
>   i965/fs: fix copy/constant propagation regioning checks
>   i965/fs: add shuffle_32bit_load_result_to_64bit_data helper
>   i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles
>   i965/fs: fix pull constant load component selection for doubles
>   i965/fs: support doubles with UBO loads
>   i965/fs: Add do_untyped_vector_read helper
>   i965/fs: support double with shared variable loads
>   i965/fs: support doubles with SSBO loads
>   i965/fs: add shuffle_64bit_data_for_32bit_write helper
>   i965/fs: support doubles with ssbo stores
>   i965/fs: support doubles with shared variable stores
>   i965/vec4: handle doubles in type_size_vec4()
>   i965/fs: fix number of output components for doubles
>   i965/fs: fix nir_intrinsic_store_output for doubles
>   i965/tcs/scalar: fix load input for doubles
>   i965/tcs/scalar: fix store output for doubles
>   i965/tes/scalar: Fix load input for doubles
>   i965: Enable ARB_gpu_shader_fp64 for gen8+
>   docs: Mark ARB_gpu_shader_fp64 as done for i965/gen8+
>   i965: Expose OpenGL 4.0 for gen8+
> 
>  docs/GL3.txt                                       |   2 +-
>  src/mesa/drivers/dri/i965/brw_fs.cpp               | 173 ++++++--
>  src/mesa/drivers/dri/i965/brw_fs.h                 |  16 +
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 136 +++---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp           | 459 +++++++++++++++++----
>  src/mesa/drivers/dri/i965/brw_ir_fs.h              |  17 +-
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp     |   9 +-
>  src/mesa/drivers/dri/i965/intel_extensions.c       |   5 +-
>  src/mesa/drivers/dri/i965/intel_screen.c           |   2 +-
>  9 files changed, 621 insertions(+), 198 deletions(-)
>