[Mesa-dev] [PATCH v4 28/44] i965/fs: Use untyped_surface_read for 16-bit load_ssbo (v2)

Tue Dec 5 23:14:34 UTC 2017

On 05/12/17 23:47, Jason Ekstrand wrote:
> On Tue, Dec 5, 2017 at 1:36 PM, Jose Maria Casanova Crespo
> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
> 
>     SSBO loads were using byte_scattered read messages as they allow
>     reading 16-bit size components. byte_scattered messages can only
>     operate one component at a time so we needed to emit as many messages
>     as components.
> 
>     But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
>     untyped_surface_read message to read pairs of 16-bit components using
>     only one message. Once each pair is read it is unshuffled to return the
>     proper 16-bit components. vec3 case is assimilated to vec4 but the 4th
>     component is ignored.
> 
>     16-bit scalars are read using one byte_scattered_read message.
> 
>     v2: Removed use of stride = 2 on sources (Jason Ekstrand)
>         Rework optimization using unshuffle 16 reads (Chema Casanova)
>     v3: Use W and D types insead of HF and F in shuffle to avoid rounding
>         erros (Jason Ekstrand)
>         Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
> 
>     CC: Jason Ekstrand <jason at jlekstrand.net <mailto:jason at jlekstrand.net>>
>     ---
>      src/intel/compiler/brw_fs_nir.cpp | 29 ++++++++++++++++++++++-------
>      1 file changed, 22 insertions(+), 7 deletions(-)
> 
>     diff --git a/src/intel/compiler/brw_fs_nir.cpp
>     b/src/intel/compiler/brw_fs_nir.cpp
>     index e11e75e6332..8deec082d59 100644
>     --- a/src/intel/compiler/brw_fs_nir.cpp
>     +++ b/src/intel/compiler/brw_fs_nir.cpp
>     @@ -2303,16 +2303,31 @@ do_untyped_vector_read(const fs_builder &bld,
>                             unsigned num_components)
>      {
>         if (type_sz(dest.type) <= 2) {
>     -      fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
>     -      bld.MOV(read_offset, offset_reg);
>     -      for (unsigned i = 0; i < num_components; i++) {
>     -         fs_reg read_reg =
>     -            emit_byte_scattered_read(bld, surf_index, read_offset,
>     +      assert(dest.stride == 1);
>     +
>     +      if (num_components > 1) {
>     +         /* Pairs of 16-bit components can be read with untyped
>     read, for 16-bit
>     +          * vec3 4th component is ignored.
>     +          */
>     +         fs_reg read_result =
>     +            emit_untyped_read(bld, surf_index, offset_reg,
>     +                              1 /* dims */,
>     DIV_ROUND_UP(num_components, 2),
>     +                              BRW_PREDICATE_NONE);
>     +         shuffle_32bit_load_result_to_16bit_data(bld,
>     +               retype(dest, BRW_REGISTER_TYPE_W),
>     +               retype(read_result, BRW_REGISTER_TYPE_D),
>     +               num_components);
>     +      } else {
>     +         assert(num_components == 1);
>     +         /* scalar 16-bit are read using one byte_scattered_read
>     message */
>     +         fs_reg read_result =
>     +            emit_byte_scattered_read(bld, surf_index, offset_reg,
>                                           1 /* dims */, 1,
>                                           type_sz(dest.type) * 8 /*
>     bit_size */,
>                                           BRW_PREDICATE_NONE);
>     -         bld.MOV(offset(dest, bld, i), subscript(read_reg,
>     dest.type, 0));
>     -         bld.ADD(read_offset, read_offset,
>     brw_imm_ud(type_sz(dest.type)));
>     +         read_result.type = dest.type;
>     +         read_result.stride = 2;
>     +         bld.MOV(dest, read_result);
> 
> 
> If read_reg has a 32-bit type, you could use subscript here.  Meh.

Fixed locally.

> Reviewed-by: Jason Ekstrand <jason at jlekstrand.net

Thanks for the reviews. This was the last pending review to address
before being ready to land this part of the series.

I'm waiting to confirm with Jenkins that I fixed a regression it found
in the 16-bit load_ubo implementation with the sampler.

>  
> 
>            }
>         } else if (type_sz(dest.type) == 4) {
>            fs_reg read_result = emit_untyped_read(bld, surf_index,
>     offset_reg,
>     --
>     2.11.0
> 
>