[Mesa-dev] [PATCH v3 19/43] i965/fs: Support push constants of 16-bit types

Mon Oct 30 23:26:41 UTC 2017

On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo <
jmcasanova at igalia.com> wrote:

> We enable the use of 16-bit values in push constants
> modifying the assign_constant_locations function to work
> with 16-bit types.
>
> The API to access buffers in Vulkan use multiples of 4-byte for
> offsets and sizes. Current accountability of uniforms based on 4-byte
> slots will work for 16-bit values if they are allowed to use 32-bit
> slots. For that, we replace the division by 4 by a DIV_ROUND_UP, so
> 2-byte elements will use 1 slot instead of 0.
>

I'm fairly sure this doesn't actually work correctly.  That said, I'm also
fairly sure the current code is broken for 64 bits.  In particular, let's
suppose we have something like this:

layout(push_constant) {
    struct {
        int i;
        int pad1;
        float f;
        double d;
        int pad2;
    } arr[2];
}

main()
{
    out_color = vec4(arr[arr[0].i].f, float(arr[arr[0].i].d),
arr[arr[1].i].f, float(arr[arr[1].i].d));
}

I'm pretty sure the current code will explode because it will see a single
contiguous chunk that's neither 32 nor 64-bit.  If that particular shader
doesn't break it, I'm sure some permutation of it will.  Things only get
worse if we throw in 16-bit.

Ultimately, I think the solution is to throw away our current scheme of
trying to separate things out by bit size and move to a scheme where we
work with everything in bytes in assign_constant_locations and trust in the
contiguous[] array to determine where to split things up.  We would
probably want to continue only re-arranging things 4 bytes at a time.

That said, I don't think this patch makes anything any worse...

> We aligns the 16-bit locations after assigning the 32-bit
> ones.
> ---
>  src/intel/compiler/brw_fs.cpp | 30 +++++++++++++++++++++++-------
>  1 file changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index a1d49a63be..8da16145dc 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -1909,8 +1909,9 @@ set_push_pull_constant_loc(unsigned uniform, int
> *chunk_start,
>     if (!contiguous) {
>        /* If bitsize doesn't match the target one, skip it */
>        if (*max_chunk_bitsize != target_bitsize) {
> -         /* FIXME: right now we only support 32 and 64-bit accesses */
> -         assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8);
> +         assert(*max_chunk_bitsize == 4 ||
> +                *max_chunk_bitsize == 8 ||
> +                *max_chunk_bitsize == 2);
>           *max_chunk_bitsize = 0;
>           *chunk_start = -1;
>           return;
> @@ -1987,8 +1988,9 @@ fs_visitor::assign_constant_locations()
>           int constant_nr = inst->src[i].nr + inst->src[i].offset / 4;
>
>           if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) {
> -            assert(inst->src[2].ud % 4 == 0);
> -            unsigned last = constant_nr + (inst->src[2].ud / 4) - 1;
> +            assert(type_sz(inst->src[i].type) == 2 ?
> +                   (inst->src[2].ud % 2 == 0) : (inst->src[2].ud % 4 ==
> 0));
> +            unsigned last = constant_nr + DIV_ROUND_UP(inst->src[2].ud,
> 4) - 1;
>              assert(last < uniforms);
>
>              for (unsigned j = constant_nr; j < last; j++) {
> @@ -2000,8 +2002,8 @@ fs_visitor::assign_constant_locations()
>              bitsize_access[last] = MAX2(bitsize_access[last],
> type_sz(inst->src[i].type));
>           } else {
>              if (constant_nr >= 0 && constant_nr < (int) uniforms) {
> -               int regs_read = inst->components_read(i) *
> -                  type_sz(inst->src[i].type) / 4;
> +               int regs_read = DIV_ROUND_UP(inst->components_read(i) *
> +                                            type_sz(inst->src[i].type),
> 4);
>                 for (int j = 0; j < regs_read; j++) {
>                    is_live[constant_nr + j] = true;
>                    bitsize_access[constant_nr + j] =
> @@ -2062,7 +2064,7 @@ fs_visitor::assign_constant_locations()
>
>     }
>
> -   /* Then push the rest of uniforms */
> +   /* Then push the 32-bit uniforms */
>     const unsigned uniform_32_bit_size = type_sz(BRW_REGISTER_TYPE_F);
>     for (unsigned u = 0; u < uniforms; u++) {
>        if (!is_live[u])
> @@ -2081,6 +2083,20 @@ fs_visitor::assign_constant_locations()
>                                   stage_prog_data);
>     }
>
> +   const unsigned uniform_16_bit_size = type_sz(BRW_REGISTER_TYPE_HF);
> +   for (unsigned u = 0; u < uniforms; u++) {
> +      if (!is_live[u])
> +         continue;
> +
> +      set_push_pull_constant_loc(u, &chunk_start, &max_chunk_bitsize,
> +                                 contiguous[u], bitsize_access[u],
> +                                 uniform_16_bit_size,
> +                                 push_constant_loc, pull_constant_loc,
> +                                 &num_push_constants, &num_pull_constants,
> +                                 max_push_components, max_chunk_size,
> +                                 stage_prog_data);
> +   }
> +
>     /* Add the CS local thread ID uniform at the end of the push constants
> */
>     if (thread_local_id_index >= 0)
>        push_constant_loc[thread_local_id_index] = num_push_constants++;
> --
> 2.13.6
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171030/673e7a42/attachment-0001.html>