[Mesa-dev] [PATCH 09/15] i965: Add a new representation for Broadwell shader instructions.

Eric Anholt eric at anholt.net
Mon Nov 18 12:27:18 PST 2013


Kenneth Graunke <kenneth at whitecape.org> writes:

> Broadwell significantly changes the EU instruction encoding.  Many of
> the fields got moved to different bit positions; some even got split
> in two.
>
> With so many changes, it was infeasible to continue using struct
> brw_instruction.  We needed a new representation.
>
> This new approach is a bit different: rather than a struct, I created a
> class that has four DWords, and helper functions that read/write various
> bits.  This has several advantages:
>
> 1. We can create several different names for the same bits.  For
>    example, conditional modifiers, SFID for SEND instructions, and the
>    MATH instruction's function opcode are all stored in bits 27:24.
>
>    In each situation, we can use the appropriate setter function:
>    set_sfid(), set_math_function(), or set_cond_modifier().  This
>    is much easier to follow.
>
> 2. Since the fields are expressed using the original 128-bit numbers,
>    the code to create the getter/setter functions follows the table in
>    the documentation very closely.
>
> To aid in debugging, I've enabled -fkeep-inline-functions when building
> gen8_instruction.cpp.  Otherwise, these functions cannot be called by
> gdb, making it insanely difficult to print out anything.

I dislike C++ creep.  I think the old structs worked OK, and there are
some minor downsides to access-everything-through-methods, like being
unable to just print the instruction in gdb and see fields all at once,
and having to do the build system hack to inline but also keep the
inlines.  But I'm feeling more and more alone on the team.

> diff --git a/src/mesa/drivers/dri/i965/gen8_instruction.cpp b/src/mesa/drivers/dri/i965/gen8_instruction.cpp
> new file mode 100644
> index 0000000..a6f8667
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/gen8_instruction.cpp

> +void
> +gen8_instruction::set_dst(struct brw_reg reg)
> +{
> +   /* MRFs haven't existed since Gen7, so we better not be using them. */
> +   if (reg.file == BRW_MESSAGE_REGISTER_FILE) {
> +      reg.file = BRW_GENERAL_REGISTER_FILE;
> +      reg.nr += GEN7_MRF_HACK_START;
> +   }
> +
> +   assert(reg.file != BRW_MESSAGE_REGISTER_FILE);

The comment doesn't match up with code just below to support it.  And I
think you could just drop the assert if you're going to still do the MRF
hack.  (Same stuff is in set_src0/set_src1).

> +void
> +gen8_instruction::validate_reg(struct brw_reg reg)
> +{
> +   int hstride_for_reg[] = {0, 1, 2, 4};
> +   int vstride_for_reg[] = {0, 1, 2, 4, 8, 16, 32, 64, 128, 256};
> +   int width_for_reg[] = {1, 2, 4, 8, 16};
> +   int execsize_for_reg[] = {1, 2, 4, 8, 16};
> +   int width, hstride, vstride, execsize;
> +
> +   if (reg.file == BRW_IMMEDIATE_VALUE) {
> +      /* TODO(gen8): check immediate vectors */
> +      return;
> +   }
> +
> +   if (reg.file == BRW_ARCHITECTURE_REGISTER_FILE)
> +      return;
> +
> +   assert(reg.hstride >= 0 && reg.hstride < Elements(hstride_for_reg));
> +   hstride = hstride_for_reg[reg.hstride];
> +
> +   if (reg.vstride == 0xf) {
> +      vstride = -1;
> +   } else {
> +      assert(reg.vstride >= 0 && reg.vstride < Elements(vstride_for_reg));
> +      vstride = vstride_for_reg[reg.vstride];
> +   }
> +
> +   assert(reg.width >= 0 && reg.width < Elements(width_for_reg));
> +   width = width_for_reg[reg.width];
> +
> +   assert(exec_size() >= 0 && exec_size() < Elements(execsize_for_reg));
> +   execsize = execsize_for_reg[exec_size()];
> +
> +   /* Restrictions from 3.3.10: Register Region Restrictions. */
> +   /* 3. */
> +   assert(execsize >= width);
> +
> +   /* 4. */
> +   if (execsize == width && hstride != 0) {
> +      assert(vstride == -1 || vstride == width * hstride);
> +   }
> +
> +   /* 5. */
> +   if (execsize == width && hstride == 0) {
> +      /* no restriction on vstride. */
> +   }
> +
> +   /* 6. */
> +   if (width == 1) {
> +      assert(hstride == 0);
> +   }
> +
> +   /* 7. */
> +   if (execsize == 1 && width == 1) {
> +      assert(hstride == 0);
> +      assert(vstride == 0);
> +   }
> +
> +   /* 8. */
> +   if (vstride == 0 && hstride == 0) {
> +      assert(width == 1);
> +   }

In my spec snapshot, these are 3.3.9.2 A-H now, but there don't seem to
be any changes for bdw.  We should really add checks for those other
region restrictions, though.  And maybe move all this to after the
instruction is all set up, since we do some tweaks of the strides and
things after validate_reg() in set_src0()/1.  Things for later.

> +void
> +gen8_instruction::set_src0(struct brw_reg reg)
> +{

> +   validate_reg(reg);
> +
> +   set_src0_reg_file(reg.file);
> +   set_src0_reg_type(reg.type);
> +   set_src0_abs(reg.abs);
> +   set_src0_negate(reg.negate);
> +
> +   assert(reg.address_mode == BRW_ADDRESS_DIRECT);
> +
> +   if (reg.file == BRW_IMMEDIATE_VALUE) {
> +      data[3] = reg.dw1.ud;
> +
> +      /* Required to set some fields in src1 as well: */
> +      set_src1_reg_file(0); /* arf */

Just use BRW_ARCHITECTURE_REGISTER_FILE?  (I realize this is copy and
paste from brw_eu_emit.c)

> +void
> +gen8_instruction::set_urb_message(enum brw_urb_write_flags flags,
> +                                  unsigned msg_length,
> +                                  unsigned response_length,
> +                                  unsigned offset,
> +                                  bool interleave)
> +{
> +   set_message_descriptor(BRW_SFID_URB, msg_length, response_length,
> +                          true, flags & BRW_URB_WRITE_EOT);
> +   set_src0(brw_vec8_grf(GEN7_MRF_HACK_START + 1, 0));
> +   if (flags & BRW_URB_WRITE_OWORD) {
> +      assert(msg_length == 2);
> +      set_urb_opcode(BRW_URB_OPCODE_WRITE_OWORD);
> +   } else {
> +      set_urb_opcode(BRW_URB_OPCODE_WRITE_HWORD);
> +   }
> +   set_urb_global_offset(offset);
> +   set_urb_interleave(interleave);
> +   set_urb_per_slot_offset(flags & BRW_URB_WRITE_PER_SLOT_OFFSET ? 1 : 0);

Pre-bdw, we also set the complete field, even though it's supposedly
ignored for URB_WRITE.  I think this is fine.

> +void
> +gen8_instruction::set_sampler_message(unsigned binding_table_index,
> +                                      unsigned sampler,
> +                                      unsigned msg_type,
> +                                      unsigned response_length,
> +                                      unsigned msg_length,
> +                                      bool header_present,
> +                                      unsigned simd_mode)
> +{
> +   set_message_descriptor(BRW_SFID_SAMPLER, msg_length, response_length,
> +                          header_present, false);
> +
> +   set_binding_table_index(binding_table_index);
> +   set_sampler(sampler);
> +   set_sampler_msg_type(msg_type);
> +   set_sampler_simd_mode(simd_mode);
> +}
> +
> +void
> +gen8_instruction::set_dp_message(enum brw_message_target sfid,
> +                                 unsigned binding_table_index,
> +                                 unsigned msg_type,
> +                                 unsigned msg_control,
> +                                 unsigned mlen,
> +                                 unsigned rlen,
> +                                 bool header_present,
> +                                 bool end_of_thread)
> +{
> +   /* Binding table index is from 0..255 */
> +   assert((binding_table_index & 0xff) == binding_table_index);
> +
> +   /* Message Type is only 5 bits */
> +   assert((msg_type & 0x1f) == msg_type);
> +
> +   /* Message Control is only 6 bits */
> +   assert((msg_control & 0x3f) == msg_control);
> +
> +   set_message_descriptor(sfid, mlen, rlen, header_present, end_of_thread);
> +   set_function_control(binding_table_index | msg_type << 14 | msg_control << 8);

The other code names each bitfield in the instruction methods.  Why use
set_function_control here?

> +}


> diff --git a/src/mesa/drivers/dri/i965/gen8_instruction.h b/src/mesa/drivers/dri/i965/gen8_instruction.h
> new file mode 100644
> index 0000000..f76c1fc
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/gen8_instruction.h
> @@ -0,0 +1,262 @@
> +/*
> + * Copyright © 2012 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +/** @file gen8_instruction.h
> + *
> + * A representation of a Gen8+ EU instruction, with helper methods to get
> + * and set various fields.  This is the actual hardware format.
> + */
> +
> +#pragma once
> +
> +extern "C" {
> +#include "main/macros.h"
> +} /* extern "C" */
> +
> +#include "brw_reg.h"
> +
> +struct gen8_instruction {
> +   uint32_t data[4];
> +
> +#define F(name, high, low) \
> +   inline void set_##name(unsigned v) { set_bits(high, low, v); } \
> +   inline unsigned name() const { return bits(high, low); }
> +
> +   /**
> +    * Direct addressing only:
> +    *  @{
> +    */
> +   F(src1_da_reg_nr,      108, 101);
> +   F(src0_da_reg_nr,       76,  69);
> +   F(dst_da1_hstride,      62,  61);
> +   F(dst_da_reg_nr,        60,  53);
> +   F(dst_da16_subreg_nr,   52,  52);
> +   F(dst_da1_subreg_nr,    52,  48);
> +   F(da16_writemask,       51,  48); /* Dst.ChanEn */
> +   /** @} */

There seems to be some confusion as to whether direct addressing bits go
in this group or down below in the other section.  I think they make
more sense inline with the rest, sorted like they are in my docs.

> +   F(src1_vert_stride,    120, 117)
> +   F(src1_da1_width,      116, 114)
> +   F(src1_da16_swiz_w,    115, 114)
> +   F(src1_da16_swiz_z,    113, 112)
> +   F(src1_da1_hstride,    113, 112)
> +   F(src1_address_mode,   111, 111)
> +   /** Src1.SrcMod @{ */
> +   F(src1_negate,         110, 110)
> +   F(src1_abs,            109, 109)
> +   /** @} */
> +   F(src1_da16_subreg_nr, 100, 100)
> +   F(src1_da1_subreg_nr,  100,  96)
> +   F(src1_da16_swiz_y,     99,  98)
> +   F(src1_da16_swiz_x,     97,  96)
> +   F(src1_reg_type,        94,  91)
> +   F(src1_reg_file,        90,  89)
> +   F(src0_vert_stride,     88,  85)
> +   F(src0_da1_width,       84,  82)
> +   F(src0_da16_swiz_w,     83,  82)
> +   F(src0_da16_swiz_z,     81,  80)
> +   F(src0_da1_hstride,     81,  80)
> +   F(src0_address_mode,    79,  79)
> +   /** Src0.SrcMod @{ */
> +   F(src0_negate,          78,  78)
> +   F(src0_abs,             77,  77)
> +   /** @} */
> +   F(src0_da16_subreg_nr,  68,  68)
> +   F(src0_da1_subreg_nr,   68,  64)
> +   F(src0_da16_swiz_y,     67,  66)
> +   F(src0_da16_swiz_x,     65,  64)
> +   F(dst_address_mode,     63,  63)
> +   F(src0_reg_type,        46,  43)
> +   F(src0_reg_file,        42,  41)
> +   F(dst_reg_type,         40,  37)
> +   F(dst_reg_file,         36,  35)
> +   F(mask_control,         34,  34)
> +   F(flag_reg_nr,          33,  33)
> +   F(flag_subreg_nr,       32,  32)
> +   F(saturate,             31,  31)
> +   F(branch_control,       30,  30)
> +   F(debug_control,        30,  30)
> +   F(cmpt_control,         29,  29)
> +   F(acc_wr_control,       28,  28)
> +   F(cond_modifier,        27,  24)
> +   F(exec_size,            23,  21)
> +   F(pred_inv,             20,  20)
> +   F(pred_control,         19,  16)
> +   F(thread_control,       15,  14)
> +   F(qtr_control,          13,  12)
> +   F(nib_control,          11,  11)
> +   F(no_dd_check,          10,  10)
> +   F(no_dd_clear,           9,   9)
> +   F(access_mode,           8,   8)
> +   /* Bit 7 is Reserve d (for future Opcode expansion) */
> +   F(opcode,                6,   0)

extra ' ' in reserved.

> +   /**
> +    * Three-source instructions:
> +    *  @{
> +    */
> +   F(src2_3src_reg_nr,    125, 118)
> +   F(src2_3src_subreg_nr, 117, 115)
> +   F(src2_3src_swizzle,   114, 107)
> +   F(src2_3src_rep_ctrl,  106, 106)
> +   F(src1_3src_reg_nr,    104,  97)
> +   F(src1_3src_subreg_hi,  96,  96)
> +   F(src1_3src_subreg_lo,  95,  94)

Why are these two split?

> +   F(src1_3src_swizzle,    93,  86)
> +   F(src1_3src_rep_ctrl,   85,  85)
> +   F(src0_3src_reg_nr,     83,  76)
> +   F(src0_3src_subreg_nr,  75,  73)
> +   F(src0_3src_swizzle,    72,  65)
> +   F(src0_3src_rep_ctrl,   64,  64)
> +   F(dst_3src_reg_nr,      63,  56)
> +   F(dst_3src_subreg_nr,   55,  53)
> +   F(dst_3src_writemask,   52,  49)
> +   F(dst_3src_type,        48,  46)
> +   F(src_3src_type,        45,  43)
> +   F(src2_3src_negate,     42,  42)
> +   F(src2_3src_abs,        41,  41)
> +   F(src1_3src_negate,     40,  40)
> +   F(src1_3src_abs,        39,  39)
> +   F(src0_3src_negate,     38,  38)
> +   F(src0_3src_abs,        37,  37)
> +   /** @} */
> +
> +   /**
> +    * Fields for SEND messages:
> +    *  @{
> +    */
> +   F(eot,                 127, 127)
> +   F(mlen,                124, 121)
> +   F(rlen,                120, 116)
> +   F(header_present,      115, 115)
> +   F(function_control,    114,  96)
> +   F(sfid,                 27,  24)
> +   F(math_function,        27,  24)
> +   /** @} */
> +
> +   /**
> +    * URB message function control bits:
> +    *  @{
> +    */
> +   F(urb_per_slot_offset, 113, 113)
> +   F(urb_interleave,      111, 111)
> +   F(urb_global_offset,   110, 100)
> +   F(urb_opcode,           99,  96)
> +   /** @} */
> +
> +   /**
> +    * Sampler message function control bits:
> +    *  @{
> +    */
> +   F(sampler_simd_mode,   114, 113)
> +   F(sampler_msg_type,    112, 108)
> +   F(sampler,             107, 104)
> +   F(binding_table_index, 103,  96)

It would be nice to see the message descriptor bits using their 32-bit
bit indices within the descriptor (like you see in the docs for
descriptors) instead of 128.  That seems like one of the ways this way
of handling setup could be better than the bitfield structs.  It would
just be another F()-like macro to take those, I think.

> +   /** @} */
> +#undef F
> +
> +   /**
> +    * Flow control instruction bits:
> +    *  @{
> +    */
> +   inline unsigned uip() const { return data[2]; }
> +   inline void set_uip(unsigned uip) { data[2] = uip; }
> +   inline unsigned jip() const { return data[3]; }
> +   inline void set_jip(unsigned jip) { data[3] = jip; }
> +   /** @} */
> +
> +   inline int src1_imm_d() const { return data[3]; }
> +   inline unsigned src1_imm_ud() const { return data[3]; }
> +   inline float src1_imm_f() const { fi_type ft; ft.u = data[3]; return ft.f; }
> +
> +   void set_dst(struct brw_reg reg);
> +   void set_src0(struct brw_reg reg);
> +   void set_src1(struct brw_reg reg);
> +
> +   void set_urb_message(enum brw_urb_write_flags, unsigned mlen, unsigned rlen,
> +                        unsigned offset, bool interleave);
> +
> +   void set_sampler_message(unsigned binding_table_index, unsigned sampler,
> +                            unsigned msg_type, unsigned rlen, unsigned mlen,
> +                            bool header_present, unsigned simd_mode);
> +
> +   void set_dp_message(enum brw_message_target sfid,
> +                       unsigned binding_table_index,
> +                       unsigned msg_type,
> +                       unsigned msg_control,
> +                       unsigned msg_length,
> +                       unsigned response_length,
> +                       bool header_present,
> +                       bool end_of_thread);
> +
> +private:
> +   inline unsigned bits(unsigned high, unsigned low) const;
> +   inline void set_bits(unsigned high, unsigned low, unsigned value);
> +
> +   void validate_reg(struct brw_reg reg);
> +
> +   void set_message_descriptor(enum brw_message_target sfid,
> +                               unsigned msg_length,
> +                               unsigned response_length,
> +                               bool header_present,
> +                               bool end_of_thread);
> +};
> +
> +/**
> + * Fetch a set of contiguous bits from the instruction.
> + *
> + * Bits indexes range from 0..127; fields may not cross 32-bit boundaries.
> + */
> +inline unsigned
> +gen8_instruction::bits(unsigned high, unsigned low) const
> +{
> +   /* We assume the field doesn't cross 32-bit boundaries. */
> +   const unsigned word = high / 32;
> +   assert(word == low / 32);
> +
> +   high %= 32;
> +   low %= 32;
> +
> +   const unsigned mask = (((1 << (high - low + 1)) - 1) << low);
> +
> +   return (data[word] & mask) >> low;
> +}
> +
> +/**
> + * Set bits in the instruction, with proper shifting and masking.
> + *
> + * Bits indexes range from 0..127; fields may not cross 32-bit boundaries.
> + */
> +inline void
> +gen8_instruction::set_bits(unsigned high, unsigned low, unsigned value)
> +{
> +   const unsigned word = high / 32;
> +   assert(word == low / 32);
> +
> +   high %= 32;
> +   low %= 32;
> +
> +   const unsigned mask = (((1 << (high - low + 1)) - 1) << low);
> +
> +   data[word] = (data[word] & ~mask) | ((value << low) & mask);

Might consider doing assert(value & (mask >> low) == value).  It's something we
think should always be true, right?  This is another way I could see
this code being better than bitfield structs.  (so it would catch
missing the "? 1 : 0" in flags & BRW_URB_WRITE_PER_SLOT_OFFSET ? 1 : 0,
for example).

The unpushed patches from 1-7 are:

Reviewed-by: Eric Anholt <eric at anholt.net>

For this one, I'd like to see the message descriptor bits changed and
the mrf hack comment/assert before I r-b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131118/db1978e2/attachment-0001.pgp>


More information about the mesa-dev mailing list