[Mesa-dev] [PATCH 2/5] i965/fs: Fix register spilling for 16-wide.

Tue Oct 29 19:35:59 CET 2013

Paul Berry <stereotype441 at gmail.com> writes:

> On 21 October 2013 17:48, Eric Anholt <eric at anholt.net> wrote:
>
>> Things blew up when I enabled the debug register spill code without
>> disabling 16-wide, so I decided to just fix 16-wide spilling.
>>
>> We still don't generate 16-wide when register spilling happens as part of
>> allocation (since we expect it to be slower), but now we can experiment
>> with allowing it in some cases in the future.
>>
>
> Also, we'll need the ability to spill in 16-wide when we get to doing
> compute shaders, because there are certain conditions in compute shaders
> where SIMD8 isn't an option (basically, if a compute shader uses barriers
> or shared variables, all the threads within a thread group need to run
> simultaneously on one subslice; with the maximum possible size thread group
> (1024), that's only possible if we use SIMD16.  See Graphics BSpec:
> 3D-Media-GPGPU Engine > Media GPGPU Pipeline > Media GPGPU Pipeline >
> Programming the GPGPU Pipeline IVB+ > GPGPU Thread Limits [DevHSW+]).
>
>
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp    |  8 ++++----
>>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 15 ++++++++-------
>>  2 files changed, 12 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> index fa15f7b..6c8fb76 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> @@ -718,8 +718,8 @@ fs_generator::generate_spill(fs_inst *inst, struct
>> brw_reg src)
>>     brw_MOV(p,
>>            retype(brw_message_reg(inst->base_mrf + 1),
>> BRW_REGISTER_TYPE_UD),
>>            retype(src, BRW_REGISTER_TYPE_UD));
>> -   brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf), 1,
>> -                                inst->offset);
>> +   brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf),
>> +                                 inst->mlen, inst->offset);
>>  }
>>
>>  void
>> @@ -727,8 +727,8 @@ fs_generator::generate_unspill(fs_inst *inst, struct
>> brw_reg dst)
>>  {
>>     assert(inst->mlen != 0);
>>
>> -   brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf),
>> 1,
>> -                               inst->offset);
>> +   brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf),
>> +                                dispatch_width / 8, inst->offset);
>>  }
>>
>>  void
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> index 7826cd4..ed0ce0d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> @@ -540,7 +540,7 @@ fs_visitor::emit_unspill(fs_inst *inst, fs_reg dst,
>> uint32_t spill_offset,
>>        inst->insert_before(unspill_inst);
>>
>>        dst.reg_offset++;
>>
>
> I had to do a lot of digging to reassure myself that this didn't need to
> change to "dst.reg_offset += dispatch_width / 8;".  Would you mind
> clarifying the comment above the declaration of reg_offset in brw_fs.h?
> Currently it says:
>
>    /**
>     * For virtual registers, this is a hardware register offset from
>     * the start of the register block (for example, a constant index
>     * in an array access).
>     */
>
> which seems to imply that it's measured in increments of a single 256-bit
> register.  But it appears that for virtual registers, it's actually in
> increments of (dispatch_width/8) 256-bit registers.
>
>
>> -      spill_offset += REG_SIZE;
>> +      spill_offset += dispatch_width * sizeof(float);
>>     }
>>  }
>>
>> @@ -624,10 +624,11 @@ fs_visitor::choose_spill_reg(struct ra_graph *g)
>>  void
>>  fs_visitor::spill_reg(int spill_reg)
>>  {
>> +   int reg_size = dispatch_width * sizeof(float);
>>     int size = virtual_grf_sizes[spill_reg];
>>     unsigned int spill_offset = c->last_scratch;
>>     assert(ALIGN(spill_offset, 16) == spill_offset); /* oword read/write
>> req. */
>> -   c->last_scratch += size * REG_SIZE;
>> +   c->last_scratch += size * reg_size;
>>
>>     /* Generate spill/unspill instructions for the objects being
>>      * spilled.  Right now, we spill or unspill the whole thing to a
>> @@ -646,7 +647,7 @@ fs_visitor::spill_reg(int spill_reg)
>>              inst->src[i].reg_offset = 0;
>>
>>              emit_unspill(inst, inst->src[i],
>> -                         spill_offset + REG_SIZE *
>> inst->src[i].reg_offset,
>> +                         spill_offset + reg_size *
>> inst->src[i].reg_offset,
>>                           regs_read);
>>          }
>>        }
>> @@ -654,7 +655,7 @@ fs_visitor::spill_reg(int spill_reg)
>>        if (inst->dst.file == GRF &&
>>           inst->dst.reg == spill_reg) {
>>           int subset_spill_offset = (spill_offset +
>> -                                    REG_SIZE * inst->dst.reg_offset);
>> +                                    reg_size * inst->dst.reg_offset);
>>           inst->dst.reg = virtual_grf_alloc(inst->regs_written);
>>           inst->dst.reg_offset = 0;
>>
>> @@ -677,11 +678,11 @@ fs_visitor::spill_reg(int spill_reg)
>>             fs_inst *spill_inst = new(mem_ctx) fs_inst(FS_OPCODE_SPILL,
>>                                                        reg_null_f,
>> spill_src);
>>             spill_src.reg_offset++;
>> -           spill_inst->offset = subset_spill_offset + chan * REG_SIZE;
>> +           spill_inst->offset = subset_spill_offset + chan * reg_size;
>>             spill_inst->ir = inst->ir;
>>             spill_inst->annotation = inst->annotation;
>> -           spill_inst->base_mrf = 14;
>> -           spill_inst->mlen = 2; /* header, value */
>> +           spill_inst->mlen = 1 + dispatch_width / 8; /* header, value */
>> +           spill_inst->base_mrf = 16 - spill_inst->mlen;
>>
>
> This means that for SIMD16 spills, we'll use MRFs 13-15 (previously we used
> MRFs 14-15).  Do we know for sure that MRF 13 isn't used by any
> non-spill-related code?  If so, would you mind adding a quick comment to
> clarify that?

Thanks for catching this.  Not only do we not know that m13 isn't used,
we don't even know that m14 isn't used in 16-wide!  I think pre-regalloc
I need to look at the set of used MRFs (like we do for the hack range
currently), and if we spill while m13-15 are used then just fail() out.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131029/bf8602c1/attachment-0001.pgp>