<div dir="ltr">Hi Karol,<div>Thank you for reviewing! I'll go ahead and push the changes you need from</div><div><span style="font-size:12.8px">nir_lower_</span><wbr style="font-size:12.8px"><span style="font-size:12.8px">system_values.c </span><span style="font-size:12.8px">to </span><span style="font-size:12.8px">master.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Thank you,</span></div><div><span style="font-size:12.8px">Pam</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 28, 2018 at 5:50 AM, Karol Herbst <span dir="ltr"><<a href="mailto:kherbst@redhat.com" target="_blank">kherbst@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
if the changes inside "src/compiler/nir/nir_lower_<wbr>system_values.c" are<br>
extracted into a seperate patch, this patch with the equal changes<br>
would be<br>
<br>
Reviewed-by: Karol Herbst <<a href="mailto:kherbst@redhat.com">kherbst@redhat.com</a>><br>
<br>
I would need that for a nir to codegen pass for Nouveau and maybe it<br>
will help other drivers implementing this extension as well. I don't<br>
think it would hurt to extract those, right?<br>
<br>
Thanks!<br>
<div><div class="h5"><br>
On Thu, Jun 7, 2018 at 5:34 PM, Plamena Manolova<br>
<<a href="mailto:plamena.n.manolova@gmail.com">plamena.n.manolova@gmail.com</a>> wrote:<br>
> This patch adds the implementation of ARB_compute_variable_group_<wbr>size<br>
> for i965. We do this by storing the group size in a buffer surface,<br>
> similarly to the work group number.<br>
><br>
> v2: Fix some indentation inconsistencies (Jordan, Ilia)<br>
>     Do DIV_ROUND_UP correctly in brw_nir_lower_cs_intrinsics.c (Jordan)<br>
>     Use alphabetical order in features.txt (Matt)<br>
>     Set the extension constants properly in brw_context.c<br>
><br>
> Signed-off-by: Plamena Manolova <<a href="mailto:plamena.n.manolova@gmail.com">plamena.n.manolova@gmail.com</a>><br>
> ---<br>
>  docs/features.txt                                |  2 +-<br>
>  docs/relnotes/18.2.0.html                        |  1 +<br>
>  src/compiler/nir/nir_lower_<wbr>system_values.c       | 13 ++++<br>
>  src/intel/compiler/brw_<wbr>compiler.h                |  2 +<br>
>  src/intel/compiler/brw_fs.cpp                    | 45 ++++++++----<br>
>  src/intel/compiler/brw_fs_nir.<wbr>cpp                | 20 ++++++<br>
>  src/intel/compiler/brw_nir_<wbr>lower_cs_intrinsics.c | 88 +++++++++++++++++-------<br>
>  src/mesa/drivers/dri/i965/brw_<wbr>compute.c          | 25 ++++++-<br>
>  src/mesa/drivers/dri/i965/brw_<wbr>context.c          |  6 ++<br>
>  src/mesa/drivers/dri/i965/brw_<wbr>context.h          |  1 +<br>
>  src/mesa/drivers/dri/i965/brw_<wbr>cs.c               |  4 ++<br>
>  src/mesa/drivers/dri/i965/brw_<wbr>wm_surface_state.c | 27 +++++++-<br>
>  src/mesa/drivers/dri/i965/<wbr>intel_extensions.c     |  1 +<br>
>  13 files changed, 193 insertions(+), 42 deletions(-)<br>
><br>
> diff --git a/docs/features.txt b/docs/features.txt<br>
> index ed4050cf98..81b6663288 100644<br>
> --- a/docs/features.txt<br>
> +++ b/docs/features.txt<br>
> @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve<br>
><br>
>    GL_ARB_bindless_texture                               DONE (nvc0, radeonsi)<br>
>    GL_ARB_cl_event                                       not started<br>
> -  GL_ARB_compute_variable_group_<wbr>size                    DONE (nvc0, radeonsi)<br>
> +  GL_ARB_compute_variable_group_<wbr>size                    DONE (i965, nvc0, radeonsi)<br>
>    GL_ARB_ES3_2_compatibility                            DONE (i965/gen8+)<br>
>    GL_ARB_fragment_shader_<wbr>interlock                      DONE (i965)<br>
>    GL_ARB_gpu_shader_int64                               DONE (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe)<br>
> diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html<br>
> index 0db37b620d..7475a56633 100644<br>
> --- a/docs/relnotes/18.2.0.html<br>
> +++ b/docs/relnotes/18.2.0.html<br>
> @@ -52,6 +52,7 @@ Note: some of the new features are only available with certain drivers.<br>
><br>
>  <ul><br>
>  <li>GL_ARB_fragment_shader_<wbr>interlock on i965</li><br>
> +<li>GL_ARB_compute_variable_<wbr>group_size on i965</li><br>
>  </ul><br>
><br>
>  <h2>Bug fixes</h2><br>
> diff --git a/src/compiler/nir/nir_lower_<wbr>system_values.c b/src/compiler/nir/nir_lower_<wbr>system_values.c<br>
> index 487da04262..7ab005b000 100644<br>
> --- a/src/compiler/nir/nir_lower_<wbr>system_values.c<br>
> +++ b/src/compiler/nir/nir_lower_<wbr>system_values.c<br>
> @@ -57,6 +57,14 @@ convert_block(nir_block *block, nir_builder *b)<br>
>            *    gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID"<br>
>            */<br>
><br>
> +         /*<br>
> +          * If the local work group size is variable we can't lower the global<br>
> +          * invocation id here.<br>
> +          */<br>
> +         if (b->shader->info.cs.local_<wbr>size_variable) {<br>
> +            break;<br>
> +         }<br>
> +<br>
>           nir_const_value local_size;<br>
>           memset(&local_size, 0, sizeof(local_size));<br>
>           local_size.u32[0] = b->shader->info.cs.local_size[<wbr>0];<br>
> @@ -102,6 +110,11 @@ convert_block(nir_block *block, nir_builder *b)<br>
>        }<br>
><br>
>        case SYSTEM_VALUE_LOCAL_GROUP_SIZE: {<br>
> +         /* If the local work group size is variable we can't lower it here */<br>
> +         if (b->shader->info.cs.local_<wbr>size_variable) {<br>
> +            break;<br>
> +         }<br>
> +<br>
>           nir_const_value local_size;<br>
>           memset(&local_size, 0, sizeof(local_size));<br>
>           local_size.u32[0] = b->shader->info.cs.local_size[<wbr>0];<br>
> diff --git a/src/intel/compiler/brw_<wbr>compiler.h b/src/intel/compiler/brw_<wbr>compiler.h<br>
> index 8b4e6fe2e2..f54952c28f 100644<br>
> --- a/src/intel/compiler/brw_<wbr>compiler.h<br>
> +++ b/src/intel/compiler/brw_<wbr>compiler.h<br>
> @@ -759,6 +759,7 @@ struct brw_cs_prog_data {<br>
>     unsigned threads;<br>
>     bool uses_barrier;<br>
>     bool uses_num_work_groups;<br>
> +   bool uses_variable_group_size;<br>
><br>
>     struct {<br>
>        struct brw_push_const_block cross_thread;<br>
> @@ -771,6 +772,7 @@ struct brw_cs_prog_data {<br>
>         * surface indices the CS-specific surfaces<br>
>         */<br>
>        uint32_t work_groups_start;<br>
> +      uint32_t work_group_size_start;<br>
>        /** @} */<br>
>     } binding_table;<br>
>  };<br>
> diff --git a/src/intel/compiler/brw_fs.<wbr>cpp b/src/intel/compiler/brw_fs.<wbr>cpp<br>
> index d67c0a4192..28730af47b 100644<br>
> --- a/src/intel/compiler/brw_fs.<wbr>cpp<br>
> +++ b/src/intel/compiler/brw_fs.<wbr>cpp<br>
> @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler *compiler, void *log_data,<br>
>                 int shader_time_index,<br>
>                 char **error_str)<br>
>  {<br>
> -   prog_data->local_size[0] = src_shader->info.cs.local_<wbr>size[0];<br>
> -   prog_data->local_size[1] = src_shader->info.cs.local_<wbr>size[1];<br>
> -   prog_data->local_size[2] = src_shader->info.cs.local_<wbr>size[2];<br>
> -   unsigned local_workgroup_size =<br>
> -      src_shader->info.cs.local_<wbr>size[0] * src_shader->info.cs.local_<wbr>size[1] *<br>
> -      src_shader->info.cs.local_<wbr>size[2];<br>
> -<br>
> -   unsigned min_dispatch_width =<br>
> -      DIV_ROUND_UP(local_workgroup_<wbr>size, compiler->devinfo->max_cs_<wbr>threads);<br>
> -   min_dispatch_width = MAX2(8, min_dispatch_width);<br>
> -   min_dispatch_width = util_next_power_of_two(min_<wbr>dispatch_width);<br>
> -   assert(min_dispatch_width <= 32);<br>
> +   unsigned min_dispatch_width;<br>
> +<br>
> +   if (!src_shader->info.cs.local_<wbr>size_variable) {<br>
> +      unsigned local_workgroup_size =<br>
> +         src_shader->info.cs.local_<wbr>size[0] * src_shader->info.cs.local_<wbr>size[1] *<br>
> +         src_shader->info.cs.local_<wbr>size[2];<br>
> +<br>
> +      min_dispatch_width =<br>
> +         DIV_ROUND_UP(local_workgroup_<wbr>size, compiler->devinfo->max_cs_<wbr>threads);<br>
> +      min_dispatch_width = MAX2(8, min_dispatch_width);<br>
> +      min_dispatch_width = util_next_power_of_two(min_<wbr>dispatch_width);<br>
> +      assert(min_dispatch_width <= 32);<br>
> +<br>
> +      prog_data->local_size[0] = src_shader->info.cs.local_<wbr>size[0];<br>
> +      prog_data->local_size[1] = src_shader->info.cs.local_<wbr>size[1];<br>
> +      prog_data->local_size[2] = src_shader->info.cs.local_<wbr>size[2];<br>
> +      prog_data->uses_variable_<wbr>group_size = false;<br>
> +   } else {<br>
> +      /*<br>
> +       * If the local work group size is variable we have to use a dispatch<br>
> +       * width of 32 here, since at this point we don't know the actual size of<br>
> +       * the workload.<br>
> +       */<br>
> +      min_dispatch_width = 32;<br>
> +      prog_data->uses_variable_<wbr>group_size = true;<br>
> +   }<br>
><br>
>     fs_visitor *v8 = NULL, *v16 = NULL, *v32 = NULL;<br>
>     cfg_t *cfg = NULL;<br>
> @@ -7324,7 +7338,12 @@ brw_compile_cs(const struct brw_compiler *compiler, void *log_data,<br>
>           }<br>
>        } else {<br>
>           cfg = v32->cfg;<br>
> -         cs_set_simd_size(prog_data, 32);<br>
> +         if (!src_shader->info.cs.local_<wbr>size_variable) {<br>
> +            cs_set_simd_size(prog_data, 32);<br>
> +         } else {<br>
> +            prog_data->simd_size = 32;<br>
> +            prog_data->threads = compiler->devinfo->max_cs_<wbr>threads;<br>
> +         }<br>
>           cs_fill_push_const_info(<wbr>compiler->devinfo, prog_data);<br>
>           promoted_constants = v32->promoted_constants;<br>
>        }<br>
> diff --git a/src/intel/compiler/brw_fs_<wbr>nir.cpp b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> index 166da0aa6d..c4948c2347 100644<br>
> --- a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> +++ b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br>
> @@ -3766,6 +3766,26 @@ fs_visitor::nir_emit_cs_<wbr>intrinsic(const fs_builder &bld,<br>
>        break;<br>
>     }<br>
><br>
> +   case nir_intrinsic_load_local_<wbr>group_size: {<br>
> +      const unsigned surface =<br>
> +         cs_prog_data->binding_table.<wbr>work_group_size_start;<br>
> +<br>
> +      fs_reg surf_index = brw_imm_ud(surface);<br>
> +      brw_mark_surface_used(prog_<wbr>data, surface);<br>
> +<br>
> +      /* Read the 3 GLuint components of gl_NumWorkGroups */<br>
> +      for (unsigned i = 0; i < 3; i++) {<br>
> +         fs_reg read_result =<br>
> +            emit_untyped_read(bld, surf_index,<br>
> +                              brw_imm_ud(i << 2),<br>
> +                              1 /* dims */, 1 /* size */,<br>
> +                              BRW_PREDICATE_NONE);<br>
> +         read_result.type = dest.type;<br>
> +         bld.MOV(dest, read_result);<br>
> +         dest = offset(dest, bld, 1);<br>
> +      }<br>
> +      break;<br>
> +   }<br>
>     default:<br>
>        nir_emit_intrinsic(bld, instr);<br>
>        break;<br>
> diff --git a/src/intel/compiler/brw_nir_<wbr>lower_cs_intrinsics.c b/src/intel/compiler/brw_nir_<wbr>lower_cs_intrinsics.c<br>
> index bfbdea0e8f..096e86db19 100644<br>
> --- a/src/intel/compiler/brw_nir_<wbr>lower_cs_intrinsics.c<br>
> +++ b/src/intel/compiler/brw_nir_<wbr>lower_cs_intrinsics.c<br>
> @@ -58,10 +58,12 @@ lower_cs_intrinsics_convert_<wbr>block(struct lower_intrinsics_state *state,<br>
>            *       cs_thread_local_id + subgroup_invocation;<br>
>            */<br>
>           nir_ssa_def *subgroup_id;<br>
> -         if (state->local_workgroup_size <= state->dispatch_width)<br>
> +         if ((state->local_workgroup_size <= state->dispatch_width) &&<br>
> +             !state->nir->info.cs.local_<wbr>size_variable) {<br>
>              subgroup_id = nir_imm_int(b, 0);<br>
> -         else<br>
> +         } else {<br>
>              subgroup_id = nir_load_subgroup_id(b);<br>
> +         }<br>
><br>
>           nir_ssa_def *thread_local_id =<br>
>              nir_imul(b, subgroup_id, nir_imm_int(b, state->dispatch_width));<br>
> @@ -84,43 +86,81 @@ lower_cs_intrinsics_convert_<wbr>block(struct lower_intrinsics_state *state,<br>
>            *        (gl_WorkGroupSize.x * gl_WorkGroupSize.y)) %<br>
>            *       gl_WorkGroupSize.z;<br>
>            */<br>
> -         unsigned *size = nir->info.cs.local_size;<br>
> -<br>
>           nir_ssa_def *local_index = nir_load_local_invocation_<wbr>index(b);<br>
> -<br>
> -         nir_const_value uvec3;<br>
> -         memset(&uvec3, 0, sizeof(uvec3));<br>
> -         uvec3.u32[0] = 1;<br>
> -         uvec3.u32[1] = size[0];<br>
> -         uvec3.u32[2] = size[0] * size[1];<br>
> -         nir_ssa_def *div_val = nir_build_imm(b, 3, 32, uvec3);<br>
> -         uvec3.u32[0] = size[0];<br>
> -         uvec3.u32[1] = size[1];<br>
> -         uvec3.u32[2] = size[2];<br>
> -         nir_ssa_def *mod_val = nir_build_imm(b, 3, 32, uvec3);<br>
> -<br>
> -         sysval = nir_umod(b, nir_udiv(b, local_index, div_val), mod_val);<br>
> +         if (!state->nir->info.cs.local_<wbr>size_variable) {<br>
> +            unsigned *size = nir->info.cs.local_size;<br>
> +<br>
> +            nir_const_value uvec3;<br>
> +            memset(&uvec3, 0, sizeof(uvec3));<br>
> +            uvec3.u32[0] = 1;<br>
> +            uvec3.u32[1] = size[0];<br>
> +            uvec3.u32[2] = size[0] * size[1];<br>
> +            nir_ssa_def *div_val = nir_build_imm(b, 3, 32, uvec3);<br>
> +            uvec3.u32[0] = size[0];<br>
> +            uvec3.u32[1] = size[1];<br>
> +            uvec3.u32[2] = size[2];<br>
> +            nir_ssa_def *mod_val = nir_build_imm(b, 3, 32, uvec3);<br>
> +<br>
> +            sysval = nir_umod(b, nir_udiv(b, local_index, div_val), mod_val);<br>
> +         } else {<br>
> +            nir_ssa_def *group_size_xyz = nir_load_local_group_size(b);<br>
> +            nir_ssa_def *group_size_x = nir_channel(b, group_size_xyz, 0);<br>
> +            nir_ssa_def *group_size_y = nir_channel(b, group_size_xyz, 1);<br>
> +            nir_ssa_def *group_size_z = nir_channel(b, group_size_xyz, 2);<br>
> +            nir_ssa_def *result[3];<br>
> +            result[0] = nir_umod(b, local_index, group_size_x);<br>
> +            result[1] = nir_umod(b, nir_udiv(b, local_index, group_size_x),<br>
> +               group_size_y);<br>
> +            result[2] = nir_umod(b, nir_udiv(b, local_index,<br>
> +               nir_umul_high(b, group_size_x, group_size_y)), group_size_z);<br>
> +<br>
> +            sysval = nir_vec(b, result, 3);<br>
> +         }<br>
>           break;<br>
>        }<br>
><br>
>        case nir_intrinsic_load_subgroup_<wbr>id:<br>
> -         if (state->local_workgroup_size > 8)<br>
> +         if (state->local_workgroup_size > 8 ||<br>
> +             state->nir->info.cs.local_<wbr>size_variable) {<br>
>              continue;<br>
> +        }<br>
><br>
>           /* For small workgroup sizes, we know subgroup_id will be zero */<br>
>           sysval = nir_imm_int(b, 0);<br>
>           break;<br>
><br>
>        case nir_intrinsic_load_num_<wbr>subgroups: {<br>
> -         unsigned local_workgroup_size =<br>
> -            nir->info.cs.local_size[0] * nir->info.cs.local_size[1] *<br>
> -            nir->info.cs.local_size[2];<br>
> -         unsigned num_subgroups =<br>
> -            DIV_ROUND_UP(local_workgroup_<wbr>size, state->dispatch_width);<br>
> -         sysval = nir_imm_int(b, num_subgroups);<br>
> +         if (!state->nir->info.cs.local_<wbr>size_variable) {<br>
> +            unsigned num_subgroups;<br>
> +            unsigned local_workgroup_size =<br>
> +               nir->info.cs.local_size[0] * nir->info.cs.local_size[1] *<br>
> +               nir->info.cs.local_size[2];<br>
> +            num_subgroups =<br>
> +               DIV_ROUND_UP(local_workgroup_<wbr>size, state->dispatch_width);<br>
> +            sysval = nir_imm_int(b, num_subgroups);<br>
> +         } else {<br>
> +            nir_ssa_def *group_size_xyz = nir_load_local_group_size(b);<br>
> +            nir_ssa_def *group_size_x = nir_channel(b, group_size_xyz, 0);<br>
> +            nir_ssa_def *group_size_y = nir_channel(b, group_size_xyz, 1);<br>
> +            nir_ssa_def *group_size_z = nir_channel(b, group_size_xyz, 2);<br>
> +            nir_ssa_def *group_size = nir_imul(b, group_size_x, nir_imul(b,<br>
> +               group_size_y, group_size_z));<br>
> +            nir_ssa_def *dispatch_width = nir_imm_int(b,<br>
> +               state->dispatch_width - 1);<br>
> +<br>
> +            sysval = nir_udiv(b, group_size, dispatch_width);<br>
> +         }<br>
>           break;<br>
>        }<br>
><br>
> +      case nir_intrinsic_load_global_<wbr>invocation_id: {<br>
> +         nir_ssa_def *group_id = nir_load_work_group_id(b);<br>
> +         nir_ssa_def *local_id = nir_load_local_invocation_id(<wbr>b);<br>
> +         nir_ssa_def *group_size = nir_load_local_group_size(b);<br>
> +<br>
> +         sysval = nir_iadd(b, nir_imul(b, group_id, group_size), local_id);<br>
> +         break;<br>
> +      }<br>
>        default:<br>
>           continue;<br>
>        }<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_compute.c b/src/mesa/drivers/dri/i965/<wbr>brw_compute.c<br>
> index de08fc3ac1..7949e0ff51 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_compute.c<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_compute.c<br>
> @@ -121,8 +121,11 @@ brw_emit_gpgpu_walker(struct brw_context *brw)<br>
>     }<br>
><br>
>     const unsigned simd_size = prog_data->simd_size;<br>
> -   unsigned group_size = prog_data->local_size[0] *<br>
> -      prog_data->local_size[1] * prog_data->local_size[2];<br>
> +   unsigned group_size = brw->compute.group_size != NULL ?<br>
> +      brw->compute.group_size[0] * brw->compute.group_size[1] *<br>
> +         brw->compute.group_size[2] : prog_data->local_size[0] *<br>
> +            prog_data->local_size[1] * prog_data->local_size[2];<br>
> +<br>
>     unsigned thread_width_max =<br>
>        (group_size + simd_size - 1) / simd_size;<br>
><br>
> @@ -229,6 +232,7 @@ brw_dispatch_compute(struct gl_context *ctx, const GLuint *num_groups) {<br>
><br>
>     brw->compute.num_work_groups_<wbr>bo = NULL;<br>
>     brw->compute.num_work_groups = num_groups;<br>
> +   brw->compute.group_size = NULL;<br>
>     ctx->NewDriverState |= BRW_NEW_CS_WORK_GROUPS;<br>
><br>
>     brw_dispatch_compute_common(<wbr>ctx);<br>
> @@ -248,6 +252,22 @@ brw_dispatch_compute_indirect(<wbr>struct gl_context *ctx, GLintptr indirect)<br>
>     brw->compute.num_work_groups_<wbr>bo = bo;<br>
>     brw->compute.num_work_groups_<wbr>offset = indirect;<br>
>     brw->compute.num_work_groups = indirect_group_counts;<br>
> +   brw->compute.group_size = NULL;<br>
> +   ctx->NewDriverState |= BRW_NEW_CS_WORK_GROUPS;<br>
> +<br>
> +   brw_dispatch_compute_common(<wbr>ctx);<br>
> +}<br>
> +<br>
> +static void<br>
> +brw_dispatch_compute_group_<wbr>size(struct gl_context *ctx,<br>
> +                                const GLuint *num_groups,<br>
> +                                const GLuint *group_size)<br>
> +{<br>
> +   struct brw_context *brw = brw_context(ctx);<br>
> +<br>
> +   brw->compute.num_work_groups_<wbr>bo = NULL;<br>
> +   brw->compute.num_work_groups = num_groups;<br>
> +   brw->compute.group_size = group_size;<br>
>     ctx->NewDriverState |= BRW_NEW_CS_WORK_GROUPS;<br>
><br>
>     brw_dispatch_compute_common(<wbr>ctx);<br>
> @@ -258,4 +278,5 @@ brw_init_compute_functions(<wbr>struct dd_function_table *functions)<br>
>  {<br>
>     functions->DispatchCompute = brw_dispatch_compute;<br>
>     functions-><wbr>DispatchComputeIndirect = brw_dispatch_compute_indirect;<br>
> +   functions-><wbr>DispatchComputeGroupSize = brw_dispatch_compute_group_<wbr>size;<br>
>  }<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_context.c b/src/mesa/drivers/dri/i965/<wbr>brw_context.c<br>
> index 9ced230ec1..25d354e155 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_context.c<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_context.c<br>
> @@ -766,6 +766,12 @@ brw_initialize_cs_context_<wbr>constants(struct brw_context *brw)<br>
>     ctx->Const.<wbr>MaxComputeWorkGroupSize[2] = max_invocations;<br>
>     ctx->Const.<wbr>MaxComputeWorkGroupInvocations = max_invocations;<br>
>     ctx->Const.<wbr>MaxComputeSharedMemorySize = 64 * 1024;<br>
> +<br>
> +   /* ARB_compute_variable_group_<wbr>size constants */<br>
> +   ctx->Const.<wbr>MaxComputeVariableGroupSize[0] = max_invocations;<br>
> +   ctx->Const.<wbr>MaxComputeVariableGroupSize[1] = max_invocations;<br>
> +   ctx->Const.<wbr>MaxComputeVariableGroupSize[2] = max_invocations;<br>
> +   ctx->Const.<wbr>MaxComputeVariableGroupInvocat<wbr>ions = max_invocations;<br>
>  }<br>
><br>
>  /**<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_context.h b/src/mesa/drivers/dri/i965/<wbr>brw_context.h<br>
> index 2613b9fda2..0fb533c369 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_context.h<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_context.h<br>
> @@ -931,6 +931,7 @@ struct brw_context<br>
>        struct brw_bo *num_work_groups_bo;<br>
>        GLintptr num_work_groups_offset;<br>
>        const GLuint *num_work_groups;<br>
> +      const GLuint *group_size;<br>
>     } compute;<br>
><br>
>     struct {<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_cs.c b/src/mesa/drivers/dri/i965/<wbr>brw_cs.c<br>
> index e3f8fc67a4..007273390b 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_cs.c<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_cs.c<br>
> @@ -43,6 +43,10 @@ assign_cs_binding_table_<wbr>offsets(const struct gen_device_info *devinfo,<br>
>     prog_data->binding_table.work_<wbr>groups_start = next_binding_table_offset;<br>
>     next_binding_table_offset++;<br>
><br>
> +   /* May not be used if the work group size is not variable. */<br>
> +   prog_data->binding_table.work_<wbr>group_size_start = next_binding_table_offset;<br>
> +   next_binding_table_offset++;<br>
> +<br>
>     brw_assign_common_binding_<wbr>table_offsets(devinfo, prog, &prog_data->base,<br>
>                                             next_binding_table_offset);<br>
>  }<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/<wbr>brw_wm_surface_state.c<br>
> index 73cae9ef7c..fa8851e2b4 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_wm_surface_state.c<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_wm_surface_state.c<br>
> @@ -1634,7 +1634,7 @@ const struct brw_tracked_state brw_wm_image_surfaces = {<br>
>  };<br>
><br>
>  static void<br>
> -brw_upload_cs_work_groups_<wbr>surface(struct brw_context *brw)<br>
> +brw_upload_cs_variable_<wbr>surfaces(struct brw_context *brw)<br>
>  {<br>
>     struct gl_context *ctx = &brw->ctx;<br>
>     /* _NEW_PROGRAM */<br>
> @@ -1671,6 +1671,29 @@ brw_upload_cs_work_groups_<wbr>surface(struct brw_context *brw)<br>
>                                      RELOC_WRITE);<br>
>        brw->ctx.NewDriverState |= BRW_NEW_SURFACES;<br>
>     }<br>
> +<br>
> +   if (prog && cs_prog_data->uses_variable_<wbr>group_size) {<br>
> +      const unsigned surf_idx =<br>
> +         cs_prog_data->binding_table.<wbr>work_group_size_start;<br>
> +      uint32_t *surf_offset = &brw->cs.base.surf_offset[<wbr>surf_idx];<br>
> +      struct brw_bo *bo;<br>
> +      uint32_t bo_offset;<br>
> +<br>
> +      bo = NULL;<br>
> +      brw_upload_data(&brw->upload,<br>
> +                     (void *)brw->compute.group_size,<br>
> +                      3 * sizeof(GLuint),<br>
> +                      sizeof(GLuint),<br>
> +                      &bo,<br>
> +                      &bo_offset);<br>
> +<br>
> +      brw_emit_buffer_surface_state(<wbr>brw, surf_offset,<br>
> +                                    bo, bo_offset,<br>
> +                                    ISL_FORMAT_RAW,<br>
> +                                    3 * sizeof(GLuint), 1,<br>
> +                                    RELOC_WRITE);<br>
> +      brw->ctx.NewDriverState |= BRW_NEW_SURFACES;<br>
> +   }<br>
>  }<br>
><br>
>  const struct brw_tracked_state brw_cs_work_groups_surface = {<br>
> @@ -1678,5 +1701,5 @@ const struct brw_tracked_state brw_cs_work_groups_surface = {<br>
>        .brw = BRW_NEW_CS_PROG_DATA |<br>
>               BRW_NEW_CS_WORK_GROUPS<br>
>     },<br>
> -   .emit = brw_upload_cs_work_groups_<wbr>surface,<br>
> +   .emit = brw_upload_cs_variable_<wbr>surfaces,<br>
>  };<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>intel_extensions.c b/src/mesa/drivers/dri/i965/<wbr>intel_extensions.c<br>
> index 5a9369d7b4..f213360ed8 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>intel_extensions.c<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>intel_extensions.c<br>
> @@ -258,6 +258,7 @@ intelInitExtensions(struct gl_context *ctx)<br>
>              ctx->Extensions.ARB_compute_<wbr>shader = true;<br>
>              ctx->Extensions.ARB_ES3_1_<wbr>compatibility =<br>
>                 devinfo->gen >= 8 || devinfo->is_haswell;<br>
> +            ctx->Extensions.ARB_compute_<wbr>variable_group_size = true;<br>
>           }<br>
><br>
>           if (can_do_predicate_writes(brw-><wbr>screen)) {<br>
> --<br>
> 2.11.0<br>
</div></div>> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</blockquote></div><br></div>