[Mesa-dev] [PATCH 2/5] i965/fs: Fix register spilling for 16-wide.
Paul Berry
stereotype441 at gmail.com
Mon Oct 28 01:12:51 CET 2013
On 21 October 2013 17:48, Eric Anholt <eric at anholt.net> wrote:
> Things blew up when I enabled the debug register spill code without
> disabling 16-wide, so I decided to just fix 16-wide spilling.
>
> We still don't generate 16-wide when register spilling happens as part of
> allocation (since we expect it to be slower), but now we can experiment
> with allowing it in some cases in the future.
>
Also, we'll need the ability to spill in 16-wide when we get to doing
compute shaders, because there are certain conditions in compute shaders
where SIMD8 isn't an option (basically, if a compute shader uses barriers
or shared variables, all the threads within a thread group need to run
simultaneously on one subslice; with the maximum possible size thread group
(1024), that's only possible if we use SIMD16. See Graphics BSpec:
3D-Media-GPGPU Engine > Media GPGPU Pipeline > Media GPGPU Pipeline >
Programming the GPGPU Pipeline IVB+ > GPGPU Thread Limits [DevHSW+]).
> ---
> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 8 ++++----
> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 15 ++++++++-------
> 2 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index fa15f7b..6c8fb76 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -718,8 +718,8 @@ fs_generator::generate_spill(fs_inst *inst, struct
> brw_reg src)
> brw_MOV(p,
> retype(brw_message_reg(inst->base_mrf + 1),
> BRW_REGISTER_TYPE_UD),
> retype(src, BRW_REGISTER_TYPE_UD));
> - brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf), 1,
> - inst->offset);
> + brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf),
> + inst->mlen, inst->offset);
> }
>
> void
> @@ -727,8 +727,8 @@ fs_generator::generate_unspill(fs_inst *inst, struct
> brw_reg dst)
> {
> assert(inst->mlen != 0);
>
> - brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf),
> 1,
> - inst->offset);
> + brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf),
> + dispatch_width / 8, inst->offset);
> }
>
> void
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> index 7826cd4..ed0ce0d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> @@ -540,7 +540,7 @@ fs_visitor::emit_unspill(fs_inst *inst, fs_reg dst,
> uint32_t spill_offset,
> inst->insert_before(unspill_inst);
>
> dst.reg_offset++;
>
I had to do a lot of digging to reassure myself that this didn't need to
change to "dst.reg_offset += dispatch_width / 8;". Would you mind
clarifying the comment above the declaration of reg_offset in brw_fs.h?
Currently it says:
/**
* For virtual registers, this is a hardware register offset from
* the start of the register block (for example, a constant index
* in an array access).
*/
which seems to imply that it's measured in increments of a single 256-bit
register. But it appears that for virtual registers, it's actually in
increments of (dispatch_width/8) 256-bit registers.
> - spill_offset += REG_SIZE;
> + spill_offset += dispatch_width * sizeof(float);
> }
> }
>
> @@ -624,10 +624,11 @@ fs_visitor::choose_spill_reg(struct ra_graph *g)
> void
> fs_visitor::spill_reg(int spill_reg)
> {
> + int reg_size = dispatch_width * sizeof(float);
> int size = virtual_grf_sizes[spill_reg];
> unsigned int spill_offset = c->last_scratch;
> assert(ALIGN(spill_offset, 16) == spill_offset); /* oword read/write
> req. */
> - c->last_scratch += size * REG_SIZE;
> + c->last_scratch += size * reg_size;
>
> /* Generate spill/unspill instructions for the objects being
> * spilled. Right now, we spill or unspill the whole thing to a
> @@ -646,7 +647,7 @@ fs_visitor::spill_reg(int spill_reg)
> inst->src[i].reg_offset = 0;
>
> emit_unspill(inst, inst->src[i],
> - spill_offset + REG_SIZE *
> inst->src[i].reg_offset,
> + spill_offset + reg_size *
> inst->src[i].reg_offset,
> regs_read);
> }
> }
> @@ -654,7 +655,7 @@ fs_visitor::spill_reg(int spill_reg)
> if (inst->dst.file == GRF &&
> inst->dst.reg == spill_reg) {
> int subset_spill_offset = (spill_offset +
> - REG_SIZE * inst->dst.reg_offset);
> + reg_size * inst->dst.reg_offset);
> inst->dst.reg = virtual_grf_alloc(inst->regs_written);
> inst->dst.reg_offset = 0;
>
> @@ -677,11 +678,11 @@ fs_visitor::spill_reg(int spill_reg)
> fs_inst *spill_inst = new(mem_ctx) fs_inst(FS_OPCODE_SPILL,
> reg_null_f,
> spill_src);
> spill_src.reg_offset++;
> - spill_inst->offset = subset_spill_offset + chan * REG_SIZE;
> + spill_inst->offset = subset_spill_offset + chan * reg_size;
> spill_inst->ir = inst->ir;
> spill_inst->annotation = inst->annotation;
> - spill_inst->base_mrf = 14;
> - spill_inst->mlen = 2; /* header, value */
> + spill_inst->mlen = 1 + dispatch_width / 8; /* header, value */
> + spill_inst->base_mrf = 16 - spill_inst->mlen;
>
This means that for SIMD16 spills, we'll use MRFs 13-15 (previously we used
MRFs 14-15). Do we know for sure that MRF 13 isn't used by any
non-spill-related code? If so, would you mind adding a quick comment to
clarify that?
> inst->insert_after(spill_inst);
> }
> }
> --
> 1.8.4.rc3
>
With those comments added, the patch is:
Reviewed-by: Paul Berry <stereotype441 at gmail.com>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131027/f13fb732/attachment-0001.html>
More information about the mesa-dev
mailing list