[Mesa-dev] [PATCH] i965/vs: fix unit mismatch in base_offset.

Mon Oct 1 12:44:19 PDT 2012

On 09/30/2012 10:08 PM, Kenneth Graunke wrote:
> On 09/20/2012 07:06 PM, Paul Berry wrote:
>> Callers of emit_scratch_{read,write}() pass a base_offset that is
>> measured in bytes, but emit_scratch_{read,write}() was assuming it was
>> measured in multiples of a complete register.  As a result, a shader
>> using a scratch read/write offset greater than zero (in practice, a
>> shader containing more than one variable in scratch) would use too
>> large an offset, frequently exceeding the available scratch space.
>>
>> This patch corrects the mismatch by dividing base_offset by REG_SIZE
>> at the beginning of emit_scratch_{read,write}().
>>
>> NOTE: This is a candidate for stable release branches.
>> ---
>>
>> No test cases were failing because of this bug, but it was getting in
>> the way of Ken's efforts to implement register spilling for vertex
>> shaders.
>>
>> Ken: I've done a full piglit run on master to verify that it doesn't
>> regress anything, but I didn't check whether there are any remaining
>> regressions when I apply it to your "spilling" branch.  All I checked
>> was that it fixed the 3 regressions you told me about earlier today.
>>
>>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
>> index 682837f..60fab54 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
>> @@ -2443,12 +2443,19 @@ vec4_visitor::get_pull_constant_offset(vec4_instruction *inst,
>>  /**
>>   * Emits an instruction before @inst to load the value named by @orig_src
>>   * from scratch space at @base_offset to @temp.
>> + *
>> + * @base_offset is measured in bytes.
>>   */
>>  void
>>  vec4_visitor::emit_scratch_read(vec4_instruction *inst,
>>  				dst_reg temp, src_reg orig_src,
>>  				int base_offset)
>>  {
>> +   /* Convert base_offset from bytes to a register number, since that's what
>> +    * get_scratch_offset() expects.
>> +    */
>> +   base_offset /= REG_SIZE;
>> +
>>     int reg_offset = base_offset + orig_src.reg_offset;
>>     src_reg index = get_scratch_offset(inst, orig_src.reladdr, reg_offset);
>>  
>> @@ -2458,12 +2465,19 @@ vec4_visitor::emit_scratch_read(vec4_instruction *inst,
>>  /**
>>   * Emits an instruction after @inst to store the value to be written
>>   * to @orig_dst to scratch space at @base_offset, from @temp.
>> + *
>> + * @base_offset is measured in bytes.
>>   */
>>  void
>>  vec4_visitor::emit_scratch_write(vec4_instruction *inst,
>>  				 src_reg temp, dst_reg orig_dst,
>>  				 int base_offset)
>>  {
>> +   /* Convert base_offset from bytes to a register number, since that's what
>> +    * get_scratch_offset() expects.
>> +    */
>> +   base_offset /= REG_SIZE;
>> +
>>     int reg_offset = base_offset + orig_dst.reg_offset;
>>     src_reg index = get_scratch_offset(inst, orig_dst.reladdr, reg_offset);
> 
> Okay.  I've looked at this a bit more, and I think your patch is right,
> but there might be a bug pre-Gen6.
> 
> Here's my thinking.  In move_grf_array_access_to_scratch(), I see:
> 
>    c->last_scratch += this->virtual_grf_sizes[src->reg] * 8 * 4;
> 
> This is clearly in measured in bytes: 8 * 4 is the size of a vec8 (or
> simd4x2) register.  This is passed to emit_scratch_read/write.
> 
> So the base_offset argument to emit_scratch_read/write is measured in
> bytes.  Your patch then causes it to be divided by REG_SIZE (8*4)
> (ironically undoing the earlier scaling), at which point base_offset is
> again the number of simd4x2 registers, or OWords.  We add reg_offset,
> which is also in OWords.
> 
> We then pass that to get_scratch_offset.  Everything's dandy for Gen6+,
> we want OWords.  But for earlier hardware, we hit:
> 
>    /* Pre-gen6, the message header uses byte offsets instead of vec4
>     * (16-byte) offset units.
>     */
>    if (intel->gen < 6)
>       message_header_scale *= 16;
> 
> At which point I say...16 bytes???   We're converting from units of
> OWords/SIMD4x2 registers, not units of vec4s.  So I believe this should
> be 32 bytes (REG_SIZE aka 8*4).
> 
> Thoughts?

Ooops.  Paul pointed out two problems with my analysis:
- An OWord is the size of a vec4, not a vec8, since a float is actually
a DWord.  (Grumble...terminology...)
- message_header_scale is actually initalized to 2, which scales from
number of whole 4x2 registers (vec8s) to OWords (vec4s).  This also
should correctly scale by 32 on Gen4-5.

So I believe Paul's patch is correct and the Gen4-5 bug I was concerned
about does not actually exist.

I respun a version that just leaves the variable in number of registers,
rather than multiplying by 32 to get bytes and immediately dividing
again.  Neither of us have a strong preference.  I'll send mine out
momentarily.