[Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker

Jordan Justen jordan.l.justen at intel.com
Wed Apr 29 17:00:27 PDT 2015


On 2015-04-27 19:02:38, Kenneth Graunke wrote:
> On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote:
> > Tested on Ivybridge, Haswell and Broadwell.
> > 
> > Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_compute.c | 39 ++++++++++++++++++++++++++++++++-
> >  src/mesa/drivers/dri/i965/brw_defines.h |  1 +
> >  2 files changed, 39 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_compute.c b/src/mesa/drivers/dri/i965/brw_compute.c
> > index baed701..06ef448 100644
> > --- a/src/mesa/drivers/dri/i965/brw_compute.c
> > +++ b/src/mesa/drivers/dri/i965/brw_compute.c
> > @@ -31,12 +31,49 @@
> >  #include "brw_draw.h"
> >  #include "brw_state.h"
> >  #include "intel_batchbuffer.h"
> > +#include "brw_defines.h"
> >  
> >  
> >  static void
> >  brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups)
> >  {
> > -   _mesa_problem(&brw->ctx, "TODO: implement brw_emit_gpgpu_walker");
> > +   const struct brw_cs_prog_data *prog_data = brw->cs.prog_data;
> > +
> > +   const unsigned simd_size = prog_data->simd_size;
> > +   unsigned group_size = prog_data->local_size[0] *
> > +      prog_data->local_size[1] * prog_data->local_size[2];
> > +   unsigned thread_width_max =
> > +      (group_size + simd_size - 1) / simd_size;
> > +
> > +   uint32_t right_mask = (1u << simd_size) - 1;
> > +   const unsigned right_non_aligned = group_size & (simd_size - 1);
> > +   if (right_non_aligned != 0)
> > +      right_mask >>= (simd_size - right_non_aligned);
> 
> I think this is equvalent to:
> 
> uint32_t right_mask = (1u << (simd_size - (group_size % simd_size))) - 1;
> 
> which might be a bit simpler...
> 
> > +   BEGIN_BATCH(dwords);
> > +   OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2));
> 
> I was going to suggest splitting this into separate Gen8+ and Gen7
> blocks, but now that I look at the code...these two are slightly
> different indirect handling, and the later one is just a DWord of MBZ,
> so...it's not really that different.  I think what you have is fine :)
> 
> > +   uint32_t dwords = brw->gen < 8 ? 11 : 15;
> > +   OUT_BATCH(0);
> > +   if (brw->gen >= 8) {
> > +      OUT_BATCH(0);
> > +      OUT_BATCH(0);
> > +   }
> > +   assert(thread_width_max <= brw->max_cs_threads);
> > +   OUT_BATCH(((simd_size == 8) ? 0 : 1) << 30 |
> 
> You might want to write this as ((simd_size / 8) - 1).  That will work
> for SIMD8/16/32.

Good idea, but I think simd_size / 16 will be needed, since we need 2
for SIMD32.

> Topi would probably suggest using SET_FIELD, i.e.
> 
> #define BRW_GPGPU_SIMD_SIZE_SHIFT     30
> #define BRW_GPGPU_SIMD_SIZE_MASK      INTEL_MASK(31, 30)
> 
> SET_FIELD((simd_size / 8) - 1, BRW_GPGPU_SIMD_SIZE)
> 
> It's probably a good idea here too.

Will do.

> > +             (thread_width_max - 1));
> 
> Don't you need to set the thread height/depth maximums as well?
> I'm not really sure how this works.

We flatten the 3-dims out above in group_size, and then
thread_width_max. So, this basically focuses getting it to execute the
correct number of times. When height is not used, we can set
bottom_mask to all 1's, and only use the right_mask.

In terms of GLSL's gl_LocalInvocationID, that is a whole separate
matter. (And a whole separate patch series! :)

> > +   OUT_BATCH(0);
> 
> It'd be nice to label the 0's, i.e.

Will do.

Thanks for the review!

-Jordan

>    OUT_BATCH(0);             /* Thread Group ID Starting X */
>    OUT_BATCH(num_groups[0]); /* Thread Group ID X Dimension */
> 
> With those changes, the whole series is:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> 
> I haven't verified that these execution masks are really what you want.
> You know more about this than I do. :)
> 
> > +   if (brw->gen >= 8)
> > +      OUT_BATCH(0);
> > +   OUT_BATCH(num_groups[0]);
> > +   OUT_BATCH(0);
> > +   if (brw->gen >= 8)
> > +      OUT_BATCH(0);
> > +   OUT_BATCH(num_groups[1]);
> > +   OUT_BATCH(0);
> > +   OUT_BATCH(num_groups[2]);
> > +   OUT_BATCH(right_mask);
> > +   OUT_BATCH(0xffffffff);
> > +   ADVANCE_BATCH();
> >  }
> >  
> >  
> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h
> > index 36f46af..cd25511 100644
> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode {
> >  
> >  #define MEDIA_VFE_STATE                         0x7000
> >  #define MEDIA_INTERFACE_DESCRIPTOR_LOAD         0x7002
> > +#define GPGPU_WALKER                            0x7105
> >  
> >  #endif
> > 


More information about the mesa-dev mailing list