[Mesa-dev] [PATCH v3 19/43] i965/fs: Support push constants of 16-bit types
Jason Ekstrand
jason at jlekstrand.net
Mon Oct 30 23:26:41 UTC 2017
On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo <
jmcasanova at igalia.com> wrote:
> We enable the use of 16-bit values in push constants
> modifying the assign_constant_locations function to work
> with 16-bit types.
>
> The API to access buffers in Vulkan use multiples of 4-byte for
> offsets and sizes. Current accountability of uniforms based on 4-byte
> slots will work for 16-bit values if they are allowed to use 32-bit
> slots. For that, we replace the division by 4 by a DIV_ROUND_UP, so
> 2-byte elements will use 1 slot instead of 0.
>
I'm fairly sure this doesn't actually work correctly. That said, I'm also
fairly sure the current code is broken for 64 bits. In particular, let's
suppose we have something like this:
layout(push_constant) {
struct {
int i;
int pad1;
float f;
double d;
int pad2;
} arr[2];
}
main()
{
out_color = vec4(arr[arr[0].i].f, float(arr[arr[0].i].d),
arr[arr[1].i].f, float(arr[arr[1].i].d));
}
I'm pretty sure the current code will explode because it will see a single
contiguous chunk that's neither 32 nor 64-bit. If that particular shader
doesn't break it, I'm sure some permutation of it will. Things only get
worse if we throw in 16-bit.
Ultimately, I think the solution is to throw away our current scheme of
trying to separate things out by bit size and move to a scheme where we
work with everything in bytes in assign_constant_locations and trust in the
contiguous[] array to determine where to split things up. We would
probably want to continue only re-arranging things 4 bytes at a time.
That said, I don't think this patch makes anything any worse...
> We aligns the 16-bit locations after assigning the 32-bit
> ones.
> ---
> src/intel/compiler/brw_fs.cpp | 30 +++++++++++++++++++++++-------
> 1 file changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index a1d49a63be..8da16145dc 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -1909,8 +1909,9 @@ set_push_pull_constant_loc(unsigned uniform, int
> *chunk_start,
> if (!contiguous) {
> /* If bitsize doesn't match the target one, skip it */
> if (*max_chunk_bitsize != target_bitsize) {
> - /* FIXME: right now we only support 32 and 64-bit accesses */
> - assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8);
> + assert(*max_chunk_bitsize == 4 ||
> + *max_chunk_bitsize == 8 ||
> + *max_chunk_bitsize == 2);
> *max_chunk_bitsize = 0;
> *chunk_start = -1;
> return;
> @@ -1987,8 +1988,9 @@ fs_visitor::assign_constant_locations()
> int constant_nr = inst->src[i].nr + inst->src[i].offset / 4;
>
> if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) {
> - assert(inst->src[2].ud % 4 == 0);
> - unsigned last = constant_nr + (inst->src[2].ud / 4) - 1;
> + assert(type_sz(inst->src[i].type) == 2 ?
> + (inst->src[2].ud % 2 == 0) : (inst->src[2].ud % 4 ==
> 0));
> + unsigned last = constant_nr + DIV_ROUND_UP(inst->src[2].ud,
> 4) - 1;
> assert(last < uniforms);
>
> for (unsigned j = constant_nr; j < last; j++) {
> @@ -2000,8 +2002,8 @@ fs_visitor::assign_constant_locations()
> bitsize_access[last] = MAX2(bitsize_access[last],
> type_sz(inst->src[i].type));
> } else {
> if (constant_nr >= 0 && constant_nr < (int) uniforms) {
> - int regs_read = inst->components_read(i) *
> - type_sz(inst->src[i].type) / 4;
> + int regs_read = DIV_ROUND_UP(inst->components_read(i) *
> + type_sz(inst->src[i].type),
> 4);
> for (int j = 0; j < regs_read; j++) {
> is_live[constant_nr + j] = true;
> bitsize_access[constant_nr + j] =
> @@ -2062,7 +2064,7 @@ fs_visitor::assign_constant_locations()
>
> }
>
> - /* Then push the rest of uniforms */
> + /* Then push the 32-bit uniforms */
> const unsigned uniform_32_bit_size = type_sz(BRW_REGISTER_TYPE_F);
> for (unsigned u = 0; u < uniforms; u++) {
> if (!is_live[u])
> @@ -2081,6 +2083,20 @@ fs_visitor::assign_constant_locations()
> stage_prog_data);
> }
>
> + const unsigned uniform_16_bit_size = type_sz(BRW_REGISTER_TYPE_HF);
> + for (unsigned u = 0; u < uniforms; u++) {
> + if (!is_live[u])
> + continue;
> +
> + set_push_pull_constant_loc(u, &chunk_start, &max_chunk_bitsize,
> + contiguous[u], bitsize_access[u],
> + uniform_16_bit_size,
> + push_constant_loc, pull_constant_loc,
> + &num_push_constants, &num_pull_constants,
> + max_push_components, max_chunk_size,
> + stage_prog_data);
> + }
> +
> /* Add the CS local thread ID uniform at the end of the push constants
> */
> if (thread_local_id_index >= 0)
> push_constant_loc[thread_local_id_index] = num_push_constants++;
> --
> 2.13.6
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171030/673e7a42/attachment-0001.html>
More information about the mesa-dev
mailing list