[Mesa-dev] [PATCH 12/13] i965/fs: Use LD messages for pre-gen7 varying-index uniform loads

Mon Apr 1 12:14:06 PDT 2013

Kenneth Graunke <kenneth at whitecape.org> writes:

> On 03/20/2013 05:37 PM, Eric Anholt wrote:
>> This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
>> my GM45 forced to load all uniforms through the varying-index path), but we
>> get a whole vec4 at a time to reuse in the next commit.
>>
>> NOTE: This is a candidate for the 9.1 branch.
>> ---
>>   src/mesa/drivers/dri/i965/brw_fs.cpp      |   92 ++++++++++++++---------------
>>   src/mesa/drivers/dri/i965/brw_fs.h        |    3 +-
>>   src/mesa/drivers/dri/i965/brw_fs_cse.cpp  |    1 +
>>   src/mesa/drivers/dri/i965/brw_fs_emit.cpp |   55 +++++++++++------
>>   4 files changed, 84 insertions(+), 67 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 5d83e50..e504e3a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -238,57 +238,53 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index,
>>      exec_list instructions;
>>      fs_inst *inst;
>>
>> -   if (intel->gen >= 7) {
>> -      /* We have our constant surface use a pitch of 4 bytes, so our index can
>> -       * be any component of a vector, and then we load 4 contiguous
>> -       * components starting from that.
>> -       *
>> -       * We break down the const_offset to a portion added to the variable
>> -       * offset and a portion done using reg_offset, which means that if you
>> -       * have GLSL using something like "uniform vec4 a[20]; gl_FragColor =
>> -       * a[i]", we'll temporarily generate 4 vec4 loads from offset i * 4, and
>> -       * CSE can later notice that those loads are all the same and eliminate
>> -       * the redundant ones.
>> -       */
>> -      fs_reg vec4_offset = fs_reg(this, glsl_type::int_type);
>> -      instructions.push_tail(ADD(vec4_offset,
>> -                                 varying_offset, const_offset & ~3));
>> -
>> -      fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type);
>> -      inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
>> -                                  vec4_result, surf_index, vec4_offset);
>> -      inst->regs_written = 4;
>> -      instructions.push_tail(inst);
>> -
>> -      vec4_result.reg_offset += const_offset & 3;
>> -      instructions.push_tail(MOV(dst, vec4_result));
>> -   } else {
>> -      fs_reg offset = fs_reg(this, glsl_type::uint_type);
>> -      instructions.push_tail(ADD(offset, varying_offset, fs_reg(const_offset)));
>> -
>> -      int base_mrf = 13;
>> -      bool header_present = true;
>> -
>> -      fs_reg mrf = fs_reg(MRF, base_mrf + header_present);
>> -      mrf.type = BRW_REGISTER_TYPE_D;
>> -
>> -      /* On gen6+ we want the dword offset passed in, but on gen4/5 we need a
>> -       * dword-aligned byte offset.
>> +   /* We have our constant surface use a pitch of 4 bytes, so our index can
>> +    * be any component of a vector, and then we load 4 contiguous
>> +    * components starting from that.
>> +    *
>> +    * We break down the const_offset to a portion added to the variable
>> +    * offset and a portion done using reg_offset, which means that if you
>> +    * have GLSL using something like "uniform vec4 a[20]; gl_FragColor =
>> +    * a[i]", we'll temporarily generate 4 vec4 loads from offset i * 4, and
>> +    * CSE can later notice that those loads are all the same and eliminate
>> +    * the redundant ones.
>> +    */
>> +   fs_reg vec4_offset = fs_reg(this, glsl_type::int_type);
>> +   instructions.push_tail(ADD(vec4_offset,
>> +                              varying_offset, const_offset & ~3));
>> +
>> +   int scale = 1;
>> +   if (intel->gen == 4 && dispatch_width == 8) {
>> +      /* Pre-gen5, we can either use a SIMD8 message that requires (header,
>> +       * u, v, r) as parameters, or we can just use the SIMD16 message
>> +       * consisting of (header, u).  We choose the second, at the cost of a
>> +       * longer return length.
>
> Did you try just using the SIMD8 message and not bothering to set up the 
> v and r components?  Supposedly, they're ignored based on surface type. 
>   Although, there's an errata for SURFTYPE_1D which says 'v' needs to 
> exist, which is concerning.  But then again this is SURFTYPE_BUFFER...
>
> Then again, it's pre-Ironlake only, so I'm not super concerned...

Yeah, it's things like those errata that I was worried about.

>> +   enum opcode op;
>> +   if (intel->gen >= 7)
>> +      op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
>> +   else
>> +      op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
>> +   fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4 * scale), dst.type);
>> +   inst = new(mem_ctx) fs_inst(op, vec4_result, surf_index, vec4_offset);
>> +   inst->regs_written = 4 * scale;
>> +   instructions.push_tail(inst);
>> +
>> +   if (intel->gen < 7) {
>> +      inst->base_mrf = 13;
>> +      inst->header_present = true;
>
> Hmm?  Why not go headerless on Gen5-6?

Oh.  Yeah, I'll try to remember to fix that later.

>> +   if (intel->gen >= 5)
>> +      msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
>> +   else {
>> +      /* We always use the SIMD16 message so that we only have to load U, and
>> +       * not V/R/LOD.
>
> "not V or R."   The SIMD8 message doesn't actually have an LOD parameter 
> according to the G45 PRM.

Fixed, thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130401/cb210fd9/attachment.pgp>