[Mesa-dev] [PATCH 12/13] i965/fs: Use LD messages for pre-gen7 varying-index uniform loads
Eric Anholt
eric at anholt.net
Mon Apr 1 12:14:06 PDT 2013
Kenneth Graunke <kenneth at whitecape.org> writes:
> On 03/20/2013 05:37 PM, Eric Anholt wrote:
>> This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
>> my GM45 forced to load all uniforms through the varying-index path), but we
>> get a whole vec4 at a time to reuse in the next commit.
>>
>> NOTE: This is a candidate for the 9.1 branch.
>> ---
>> src/mesa/drivers/dri/i965/brw_fs.cpp | 92 ++++++++++++++---------------
>> src/mesa/drivers/dri/i965/brw_fs.h | 3 +-
>> src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 +
>> src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 55 +++++++++++------
>> 4 files changed, 84 insertions(+), 67 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 5d83e50..e504e3a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -238,57 +238,53 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index,
>> exec_list instructions;
>> fs_inst *inst;
>>
>> - if (intel->gen >= 7) {
>> - /* We have our constant surface use a pitch of 4 bytes, so our index can
>> - * be any component of a vector, and then we load 4 contiguous
>> - * components starting from that.
>> - *
>> - * We break down the const_offset to a portion added to the variable
>> - * offset and a portion done using reg_offset, which means that if you
>> - * have GLSL using something like "uniform vec4 a[20]; gl_FragColor =
>> - * a[i]", we'll temporarily generate 4 vec4 loads from offset i * 4, and
>> - * CSE can later notice that those loads are all the same and eliminate
>> - * the redundant ones.
>> - */
>> - fs_reg vec4_offset = fs_reg(this, glsl_type::int_type);
>> - instructions.push_tail(ADD(vec4_offset,
>> - varying_offset, const_offset & ~3));
>> -
>> - fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type);
>> - inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
>> - vec4_result, surf_index, vec4_offset);
>> - inst->regs_written = 4;
>> - instructions.push_tail(inst);
>> -
>> - vec4_result.reg_offset += const_offset & 3;
>> - instructions.push_tail(MOV(dst, vec4_result));
>> - } else {
>> - fs_reg offset = fs_reg(this, glsl_type::uint_type);
>> - instructions.push_tail(ADD(offset, varying_offset, fs_reg(const_offset)));
>> -
>> - int base_mrf = 13;
>> - bool header_present = true;
>> -
>> - fs_reg mrf = fs_reg(MRF, base_mrf + header_present);
>> - mrf.type = BRW_REGISTER_TYPE_D;
>> -
>> - /* On gen6+ we want the dword offset passed in, but on gen4/5 we need a
>> - * dword-aligned byte offset.
>> + /* We have our constant surface use a pitch of 4 bytes, so our index can
>> + * be any component of a vector, and then we load 4 contiguous
>> + * components starting from that.
>> + *
>> + * We break down the const_offset to a portion added to the variable
>> + * offset and a portion done using reg_offset, which means that if you
>> + * have GLSL using something like "uniform vec4 a[20]; gl_FragColor =
>> + * a[i]", we'll temporarily generate 4 vec4 loads from offset i * 4, and
>> + * CSE can later notice that those loads are all the same and eliminate
>> + * the redundant ones.
>> + */
>> + fs_reg vec4_offset = fs_reg(this, glsl_type::int_type);
>> + instructions.push_tail(ADD(vec4_offset,
>> + varying_offset, const_offset & ~3));
>> +
>> + int scale = 1;
>> + if (intel->gen == 4 && dispatch_width == 8) {
>> + /* Pre-gen5, we can either use a SIMD8 message that requires (header,
>> + * u, v, r) as parameters, or we can just use the SIMD16 message
>> + * consisting of (header, u). We choose the second, at the cost of a
>> + * longer return length.
>
> Did you try just using the SIMD8 message and not bothering to set up the
> v and r components? Supposedly, they're ignored based on surface type.
> Although, there's an errata for SURFTYPE_1D which says 'v' needs to
> exist, which is concerning. But then again this is SURFTYPE_BUFFER...
>
> Then again, it's pre-Ironlake only, so I'm not super concerned...
Yeah, it's things like those errata that I was worried about.
>> + enum opcode op;
>> + if (intel->gen >= 7)
>> + op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
>> + else
>> + op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
>> + fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4 * scale), dst.type);
>> + inst = new(mem_ctx) fs_inst(op, vec4_result, surf_index, vec4_offset);
>> + inst->regs_written = 4 * scale;
>> + instructions.push_tail(inst);
>> +
>> + if (intel->gen < 7) {
>> + inst->base_mrf = 13;
>> + inst->header_present = true;
>
> Hmm? Why not go headerless on Gen5-6?
Oh. Yeah, I'll try to remember to fix that later.
>> + if (intel->gen >= 5)
>> + msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
>> + else {
>> + /* We always use the SIMD16 message so that we only have to load U, and
>> + * not V/R/LOD.
>
> "not V or R." The SIMD8 message doesn't actually have an LOD parameter
> according to the G45 PRM.
Fixed, thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130401/cb210fd9/attachment.pgp>
More information about the mesa-dev
mailing list