[Mesa-dev] [PATCH 3/5] i965/fs: Delay setup of uniform pull constants until after reg alloc.
Kenneth Graunke
kenneth at whitecape.org
Sat Feb 16 14:17:36 PST 2013
On 02/15/2013 10:26 PM, Eric Anholt wrote:
> This should fix the register allocation explosion on the GLES 3.0
> test on gen6.
Except that it doesn't. You can check by running it with
INTEL_DEVID_OVERRIDE=0x116.
A tiny change below makes it work though...
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 63 +++++++++++++++++++++++++-
> src/mesa/drivers/dri/i965/brw_fs.h | 1 +
> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 28 ++----------
> 3 files changed, 65 insertions(+), 27 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 35cdc6a..1f98b27 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1710,8 +1710,6 @@ fs_visitor::setup_pull_constants()
> dst, index, offset);
> pull->ir = inst->ir;
> pull->annotation = inst->annotation;
> - pull->base_mrf = 14;
> - pull->mlen = 1;
>
> inst->insert_before(pull);
>
> @@ -2447,6 +2445,66 @@ fs_visitor::insert_gen4_send_dependency_workarounds()
> }
> }
>
> +/**
> + * Turns the generic expression-style uniform pull constant load instruction
> + * into a hardware-specific series of instructions for loading a pull
> + * constant.
> + *
> + * The expression style allows the CSE pass before this to optimize out
> + * repeated loads from the same offset, and gives the pre-register-allocation
> + * scheduling full flexibility, while the conversion to native instructions
> + * allows the post-register-allocation scheduler the best information
> + * possible.
> + */
> +void
> +fs_visitor::lower_uniform_pull_constant_loads()
> +{
> + foreach_list(node, &this->instructions) {
> + fs_inst *inst = (fs_inst *)node;
> +
> + if (inst->opcode != FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD)
> + continue;
> +
> + if (intel->gen >= 7) {
> + fs_reg const_offset_reg = inst->src[1];
> + assert(const_offset_reg.file == IMM &&
> + const_offset_reg.type == BRW_REGISTER_TYPE_UD);
> + const_offset_reg.imm.u /= 16;
> + fs_reg payload = fs_reg(this, glsl_type::uint_type);
> + struct brw_reg g0 = retype(brw_vec8_grf(0, 0),
> + BRW_REGISTER_TYPE_UD);
> +
> + fs_inst *setup1 = MOV(payload, fs_reg(g0));
> + setup1->force_writemask_all = true;
> + /* We don't need the second half of this vgrf to be filled with g1
> + * in the 16-wide case, but if we use force_uncompressed then live
> + * variable analysis won't consider this a def!
> + */
> +
> + fs_inst *setup2 = new(mem_ctx) fs_inst(FS_OPCODE_SET_GLOBAL_OFFSET,
> + payload, payload,
> + const_offset_reg);
> +
> + setup1->ir = inst->ir;
> + setup1->annotation = inst->annotation;
> + inst->insert_before(setup1);
> + setup2->ir = inst->ir;
> + setup2->annotation = inst->annotation;
> + inst->insert_before(setup2);
> + inst->opcode = FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7;
> + inst->src[1] = payload;
> + } else {
> + /* Before register allocation, we didn't tell the scheduler about the
> + * MRF we use. We know it's safe to use this MRF because nothing
> + * else does except for register spill/unspill, which generates and
> + * uses its MRF within a single IR instruction.
> + */
> + inst->base_mrf = 14;
> + inst->mlen = 1;
> + }
> + }
> +}
> +
> void
> fs_visitor::dump_instruction(fs_inst *inst)
> {
> @@ -2746,6 +2804,7 @@ fs_visitor::run()
>
> remove_dead_constants();
>
> + lower_uniform_pull_constant_loads();
> schedule_instructions(false);
You need to do lower_uniform_pull_constant_loads() *after* the first
scheduling pass, or else you hit the same strongly-ordered scheduling
dependency problem we had before.
With tha change, the reg. alloc explosion is fixed. (I haven't actually
tested on a real SNB though.)
Assuming you fix that (and it passes piglit), this series gets a:
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
Also, nice work! I'm really glad to see CSE of UBO loads.
More information about the mesa-dev
mailing list