[Mesa-dev] [PATCH 62/95] i965/vec4: Add a shuffle_64bit_data helper

Francisco Jerez currojerez at riseup.net
Wed Sep 14 05:24:09 UTC 2016


Iago Toral <itoral at igalia.com> writes:

> On Mon, 2016-09-12 at 14:19 -0700, Francisco Jerez wrote:
>> Iago Toral Quiroga <itoral at igalia.com> writes:
>> 
>> > 
>> > SIMD4x2 64bit data is stored in register space like this:
>> > 
>> > r0.0:DF  x0 y0 z0 w0
>> > r0.1:DF  x1 y1 z1 w1
>> > 
>> > When we need to write data such as this to memory using 32-bit
>> > write
>> > messages we need to shuffle it in this fashion:
>> > 
>> > r0.0:DF  x0 y0 x1 y1
>> > r0.1:DF  z0 w0 z1 w1
>> > 
>> > and emit two 32-bit write messages, one for r0.0 at base_offset
>> > and another one for r0.1 at base_offset+16.
>> > 
>> > We also need to do the inverse operation when we read using 32-bit
>> > messages
>> > to produce valid SIMD4x2 64bit data from the data read. We can
>> > achieve this
>> > by aplying the exact same shuffling to the data read, although we
>> > need to
>> > apply different channel enables since the layout of the data is
>> > reversed.
>> > 
>> > This helper implements the data shuffling logic and we will use it
>> > in
>> > various places where we read and write 64bit data from/to memory.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_vec4.h       |  5 ++
>> >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 96
>> > ++++++++++++++++++++++++++++++
>> >  2 files changed, 101 insertions(+)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h
>> > b/src/mesa/drivers/dri/i965/brw_vec4.h
>> > index 26228d0..3337fc0 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_vec4.h
>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
>> > @@ -327,6 +327,11 @@ public:
>> >  
>> >     src_reg setup_imm_df(double v);
>> >  
>> > +   vec4_instruction *shuffle_64bit_data(dst_reg dst, src_reg src,
>> > +                                        bool for_write,
>> > +                                        bblock_t *block = NULL,
>> > +                                        vec4_instruction *ref =
>> > NULL);
>> > +
>> >     virtual void emit_nir_code();
>> >     virtual void nir_setup_uniforms();
>> >     virtual void
>> > nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > index 450db92..346e822 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > @@ -2145,4 +2145,100 @@
>> > vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr)
>> >        dst_reg(VGRF, alloc.allocate(instr->def.bit_size / 32));
>> >  }
>> >  
>> > +/* SIMD4x2 64bit data is stored in register space like this:
>> > + *
>> > + * r0.0:DF  x0 y0 z0 w0
>> > + * r0.1:DF  x1 y1 z1 w1
>> > + *
>> > + * When we need to write data such as this to memory using 32-bit
>> > write
>> > + * messages we need to shuffle it in this fashion:
>> > + *
>> > + * r0.0:DF  x0 y0 x1 y1 (to be written at base offset)
>> > + * r0.0:DF  z0 w0 z1 w1 (to be written at base offset + 16)
>> > + *
>> > + * We need to do the inverse operation when we read using 32-bit
>> > messages,
>> > + * which we can do by applying the same exact shuffling on the 64-
>> > bit data
>> > + * read, only that because the data for each vertex is positioned
>> > differently
>> > + * we need to apply different channel enables.
>> > + *
>> > + * This function takes 64bit data and shuffles it as explained
>> > above.
>> > + *
>> > + * The @for_write parameter is used to specify if the shuffling is
>> > being done
>> > + * for proper SIMD4x2 64-bit data that needs to be shuffled prior
>> > to a 32-bit
>> > + * write message (for_write = true), or instead we are doing the
>> > inverse
>> > + * opperation and we have just read 64-bit data using a 32-bit
>> > messages that we
>> > + * need to shuffle to create valid SIMD4x2 64-bit data (for_write
>> > = false).
>> > + *
>> > + * If @block and @ref are non-NULL, then the shuffling is done
>> > after @ref,
>> > + * otherwise the instructions are emitted normally at the end. The
>> > function
>> > + * returns the last instruction inserted.
>> > + *
>> > + * Notice that @src and @dst cannot be the same register.
>> > + */
>> > +vec4_instruction *
>> > +vec4_visitor::shuffle_64bit_data(dst_reg dst, src_reg src, bool
>> > for_write,
>> > +                                 bblock_t *block, vec4_instruction
>> > *ref)
>> > +{
>> > +   assert(type_sz(src.type) == 8);
>> > +   assert(type_sz(dst.type) == 8);
>> > +   assert(!src.in_range(dst, 2));
>> > +   assert(dst.writemask == WRITEMASK_XYZW);
>> > +   assert(!ref == !block);
>> > +
>> > +   vec4_instruction *inst, *last;
>> > +   bool emit_before = ref != NULL;
>> > +
>> > +   #define EMIT(i) \
>> > +      if (!emit_before) { \
>> > +         emit(i); \
>> > +      } else { \
>> > +         ref->insert_after(block, i); \
>> > +         ref = i; \
>> > +      }  \
>> > +      last = i;
>> > +
>> > +   /* Resolve swizzle in src */
>> > +   if (src.swizzle != BRW_SWIZZLE_XYZW) {
>> > +      dst_reg data = dst_reg(this, glsl_type::dvec4_type);
>> > +      inst = MOV(data, src);
>> > +      EMIT(inst);
>> > +      src = src_reg(data);
>> > +   }
>> > +
>> > +   /* dst+0.XY = src+0.XY */
>> > +   dst.writemask = WRITEMASK_XY;
>> > +   inst = MOV(dst, src);
>> > +   inst->regs_written = 1;
>> > +   inst->exec_size = 4;
>> > +   EMIT(inst);
>> > +
>> > +   /* dst+0.ZW = src+1.XY */
>> > +   dst.writemask = WRITEMASK_ZW;
>> > +   inst = MOV(dst, swizzle(offset(src, 1), BRW_SWIZZLE_XYXY));
>> > +   inst->regs_written = 1;
>> > +   inst->exec_size = 4;
>> > +   inst->group = for_write ? 4 : 0;
>> > +   EMIT(inst);
>> > +
>> > +   /* dst+1.XY = src+0.ZW */
>> > +   dst.writemask = WRITEMASK_XY;
>> > +   inst = MOV(offset(dst, 1), swizzle(src, BRW_SWIZZLE_ZWZW));
>> > +   inst->regs_written = 1;
>> > +   inst->exec_size = 4;
>> > +   inst->group = for_write ? 0 : 4;
>> > +   EMIT(inst);
>> > +
>> > +   /* dst+1.ZW = src+1.ZW */
>> > +   dst.writemask = WRITEMASK_ZW;
>> > +   inst = MOV(offset(dst, 1), offset(src, 1));
>> > +   inst->regs_written = 1;
>> > +   inst->exec_size = 4;
>> > +   inst->group = 4;
>> > +   EMIT(inst);
>> > +
>> > +   #undef EMIT
>> > +
>> > +   return last;
>> > +}
>> > +
>> This would be a *lot* simpler if you used and took as argument a VEC4
>> builder instead of constructing the IR manually (and if you used the
>> writemask() helper).  It might need some small fixes for the VEC4
>> builder to initialize the new group and exec_size fields correctly
>> but
>> it seems worth doing just because of this function alone.
>
> Sure, I can do that. About using the writemask() helper, the problem
> with it is that it doesn't set the input writemask on the dst, instead
> it sets the result of "dst.writemask & mask", and when we are shuffling
> we really want to force the writemask independently of any previous
> writemask the dst register had.

Why?  If I give a register r with writemask XY to a function 'foo', I
wouldn't expect it to ignore the writemask I told it to use and instead
use a different writemask not contained in the original writemask.  If
you're doing that right now you likely have a bug.

> Probably the dst we get here has an XYZW writemask anyway since it
> would typically be a temporary vgrf we have just created to store the
> result of the shuffling operation. Maybe we should just add an assert
> to ensure that, and then we should be free to use the writemask()
> helper.
>

Looks like you have one already:

+   assert(dst.writemask == WRITEMASK_XYZW);

Question is do you need it only because you're bashing the writemasks
directly instead of using writemask(), or is there any other reason for
the assert?

>> > 
>> >  }
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160913/5086e8dd/attachment-0001.sig>


More information about the mesa-dev mailing list