[Mesa-dev] [PATCH] i965/fs: do not depend on std140 alignment rules for UBO loads

Francisco Jerez currojerez at riseup.net
Fri May 20 18:42:21 UTC 2016


Iago Toral Quiroga <itoral at igalia.com> writes:

> The previous implementation relied on the std140 alignment rules to
> avoid handling misalignment in the case where we are loading more than
> 2 double components from a vector, which requires to emit a second load
> message.
>
> This alternative implementation deals with misalignment and is more
> flexible going forward. All credit for this goes to Curro, since he
> not only suggested this but also wrote the implementation in the
> mailing list.
> ---
>
> Curro, maybe I should make you the author and me the reviewer then? :)
>
It's up to you, feel free to add either my:

Signed-off-by: Francisco Jerez <currojerez at riseup.net>

along with your R-b, or:

Reviewed-by: Francisco Jerez <currojerez at riseup.net>

>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 59 +++++++-------------------------
>  1 file changed, 13 insertions(+), 46 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index ebcc92a..35a6aed 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3583,9 +3583,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
>                                 nir->info.num_ubos - 1);
>        }
>  
> -      /* Number of 32-bit slots in the type */
> -      unsigned type_slots = MAX2(1, type_sz(dest.type) / 4);
> -
>        nir_const_value *const_offset = nir_src_as_const_value(instr->src[1]);
>        if (const_offset == NULL) {
>           fs_reg base_offset = retype(get_nir_src(instr->src[1]),
> @@ -3603,55 +3600,25 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
>            * we let CSE deal with duplicate loads. Here we see a vector access
>            * and we have to split it if necessary.
>            */
> -         fs_reg packed_consts = vgrf(glsl_type::float_type);
> -         packed_consts.type = dest.type;
> +         const unsigned type_size = type_sz(dest.type);
> +         const fs_reg packed_consts = bld.vgrf(BRW_REGISTER_TYPE_F);
> +         for (unsigned c = 0; c < instr->num_components;) {
> +            const unsigned base = const_offset->u32[0] + c * type_size;
>  
> -         unsigned const_offset_aligned = const_offset->u32[0] & ~15;
> +            /* Number of usable components in the next 16B-aligned load */
> +            const unsigned count = MIN2(instr->num_components - c,
> +                                        (16 - base % 16) / type_size);
>  
> -         /* A vec4 only contains half of a dvec4, if we need more than 2
> -          * components of a dvec4 we will have to issue another load for
> -          * components z and w.
> -          */
> -         int num_components;
> -         if (type_slots == 1)
> -            num_components = instr->num_components;
> -         else
> -            num_components = MIN2(2, instr->num_components);
> -
> -         /* The computation of num_components doesn't take into account
> -          * misalignment, which should be okay according to std140 vector
> -          * alignment rules.
> -          */
> -         assert(const_offset->u32[0] % 16 +
> -                type_sz(dest.type) * num_components <= 16);
> -
> -         int remaining_components = instr->num_components;
> -         while (remaining_components > 0) {
> -            /* Read the vec4 from a 16-byte aligned offset */
> -            struct brw_reg const_offset_reg = brw_imm_ud(const_offset_aligned);
>              bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
> -                     retype(packed_consts, BRW_REGISTER_TYPE_F),
> -                     surf_index, const_offset_reg);
> -
> -            const fs_reg consts = byte_offset(packed_consts, (const_offset->u32[0] % 16));
> -            unsigned dest_offset = instr->num_components - remaining_components;
> +                     packed_consts, surf_index, brw_imm_ud(base & ~15));
>  
> -            /* XXX: This doesn't update the sub-16B offset across iterations of
> -             * the loop, which should work for std140 vector alignment rules.
> -             */
> -            assert(dest_offset == 0 || const_offset->u32[0] % 16 == 0);
> +            const fs_reg consts =
> +               retype(byte_offset(packed_consts, base & 15), dest.type);
>  
> -            for (int i = 0; i < num_components; i++)
> -               bld.MOV(offset(dest, bld, i + dest_offset), component(consts, i));
> +            for (unsigned d = 0; d < count; d++)
> +               bld.MOV(offset(dest, bld, c + d), component(consts, d));
>  
> -            /* If this is a large enough 64-bit load, we will need to emit
> -             * another message
> -             */
> -            remaining_components -= num_components;
> -            assert(remaining_components == 0 ||
> -                   (remaining_components <= 2 && type_slots == 2));
> -            num_components = remaining_components;
> -            const_offset_aligned += 16;
> +            c += count;
>           }
>        }
>        break;
> -- 
> 1.9.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160520/d80ee545/attachment.sig>


More information about the mesa-dev mailing list