[Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
Matt Turner
mattst88 at gmail.com
Wed Apr 15 13:31:24 PDT 2015
On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> Instead of the complicated and broken-by-design pile of heuristics we had
> before, we now have a straightforward lowering:
>
> 1) All header sources are copied directly using force_writemask_all and,
> since they are guaranteed to be a single register, there are no
> force_sechalf issues.
>
> 2) All non-header sources are copied using the exact same force_sechalf
> and saturate modifiers as the LOAD_PAYLOAD operation itself.
Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD
today, and it is kind of confusing about what it means. Can't we have
fbwrites that write depth as well. I wouldn't think we wanted to
saturate that.
I don't think it buys us anything. If we just run copy propagation
after lower_load_payload() we'll get the code we want.
> 3) In order to accommodate older gens that need interleaved colors,
> lower_load_payload detects when the destination is a COMPR4 register
> and automatically interleaves the non-header sources. The
> lower_load_payload pass does the right thing here regardless of whether
> or not the hardware actually supports COMPR4.
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++++++++++++++-------------
> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 -
> 2 files changed, 80 insertions(+), 77 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index f8e26c0..bc45a38 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload()
> {
> bool progress = false;
>
> - int vgrf_to_reg[alloc.count];
> - int reg_count = 0;
> - for (unsigned i = 0; i < alloc.count; ++i) {
> - vgrf_to_reg[i] = reg_count;
> - reg_count += alloc.sizes[i];
> - }
> -
> - struct {
> - bool written:1; /* Whether this register has ever been written */
> - bool force_writemask_all:1;
> - bool force_sechalf:1;
> - } metadata[reg_count];
> - memset(metadata, 0, sizeof(metadata));
> -
> foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
> - if (inst->dst.file == GRF) {
> - const int dst_reg = vgrf_to_reg[inst->dst.reg] + inst->dst.reg_offset;
> - bool force_sechalf = inst->force_sechalf &&
> - !inst->force_writemask_all;
> - bool toggle_sechalf = inst->dst.width == 16 &&
> - type_sz(inst->dst.type) == 4 &&
> - !inst->force_writemask_all;
> - for (int i = 0; i < inst->regs_written; ++i) {
> - metadata[dst_reg + i].written = true;
> - metadata[dst_reg + i].force_sechalf = force_sechalf;
> - metadata[dst_reg + i].force_writemask_all = inst->force_writemask_all;
> - force_sechalf = (toggle_sechalf != force_sechalf);
> - }
> - }
> -
> if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) {
We should invert this condition and 'continue' so that we can avoid an
extra level of indentation which only makes the code harder to read.
> assert(inst->dst.file == MRF || inst->dst.file == GRF);
> +
> fs_reg dst = inst->dst;
>
> - for (int i = 0; i < inst->sources; i++) {
> - dst.width = inst->src[i].effective_width;
> - dst.type = inst->src[i].type;
> -
> - if (inst->src[i].file == BAD_FILE) {
> - /* Do nothing but otherwise increment as normal */
> - } else if (dst.file == MRF &&
> - dst.width == 8 &&
> - brw->has_compr4 &&
> - i + 4 < inst->sources &&
> - inst->src[i + 4].equals(horiz_offset(inst->src[i], 8))) {
> - fs_reg compr4_dst = dst;
> - compr4_dst.reg += BRW_MRF_COMPR4;
> - compr4_dst.width = 16;
> - fs_reg compr4_src = inst->src[i];
> - compr4_src.width = 16;
> - fs_inst *mov = MOV(compr4_dst, compr4_src);
> + /* Get rid of COMPR4. We'll add it back in if we need it */
> + if (dst.file == MRF && dst.reg & BRW_MRF_COMPR4)
> + dst.reg = dst.reg & ~BRW_MRF_COMPR4;
No point in checking whether the BRW_MRF_COMPR4 bit is set before
clearing it. Just clear it.
> +
> + dst.width = 8;
> + for (uint8_t i = 0; i < inst->header_size; i++) {
> + if (inst->src[i].file != BAD_FILE) {
> + fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD);
> + fs_reg mov_src = retype(inst->src[i], BRW_REGISTER_TYPE_UD);
> + mov_src.width = 8;
> + fs_inst *mov = MOV(mov_dst, mov_src);
> mov->force_writemask_all = true;
> inst->insert_before(block, mov);
> - /* Mark i+4 as BAD_FILE so we don't emit a MOV for it */
> - inst->src[i + 4].file = BAD_FILE;
> - } else {
> - fs_inst *mov = MOV(dst, inst->src[i]);
> - if (inst->src[i].file == GRF) {
> - int src_reg = vgrf_to_reg[inst->src[i].reg] +
> - inst->src[i].reg_offset;
> - mov->force_sechalf = metadata[src_reg].force_sechalf;
> - mov->force_writemask_all = metadata[src_reg].force_writemask_all;
> - } else {
> - /* We don't have any useful metadata for immediates or
> - * uniforms. Assume that any of the channels of the
> - * destination may be used.
> - */
> - assert(inst->src[i].file == IMM ||
> - inst->src[i].file == UNIFORM);
> - mov->force_writemask_all = true;
> - }
> + }
> + dst = offset(dst, 1);
> + }
>
> - if (dst.file == GRF) {
> - const int dst_reg = vgrf_to_reg[dst.reg] + dst.reg_offset;
> - const bool force_writemask = mov->force_writemask_all;
> - metadata[dst_reg].force_writemask_all = force_writemask;
> - metadata[dst_reg].force_sechalf = mov->force_sechalf;
> - if (dst.width * type_sz(dst.type) > 32) {
> - assert(!mov->force_sechalf);
> - metadata[dst_reg + 1].force_writemask_all = force_writemask;
> - metadata[dst_reg + 1].force_sechalf = !force_writemask;
> + dst.width = inst->exec_size;
> + if (inst->dst.file == MRF && (inst->dst.reg & BRW_MRF_COMPR4) &&
> + inst->exec_size > 8) {
> + /* In this case, the payload portion of the LOAD_PAYLOAD isn't
> + * a straightforward copy. Instead, the result of the
> + * LOAD_PAYLOAD is treated as interlaced and the first four
I'd prefer it if you changed 'interlaced' to 'interleaved', like
you've said elsewhere. I don't know... just something about the word
interlace that doesn't feel right for this. (Same thing applies to
other patches)
> + * non-header sources are unpacked as:
> + *
> + * m + 0: r0
> + * m + 1: g0
> + * m + 2: b0
> + * m + 3: a0
> + * m + 4: r1
> + * m + 5: g1
> + * m + 6: b1
> + * m + 7: a1
> + *
> + * This is used for gen <= 5 fb writes.
> + */
> + assert(inst->exec_size == 16);
> + assert(inst->header_size + 4 <= inst->sources);
> + for (uint8_t i = inst->header_size; i < inst->header_size + 4; i++) {
> + if (inst->src[i].file != BAD_FILE) {
> + if (brw->has_compr4) {
> + fs_reg compr4_dst = retype(dst, inst->src[i].type);
> + compr4_dst.reg |= BRW_MRF_COMPR4;
> +
> + fs_inst *mov = MOV(compr4_dst, inst->src[i]);
> + mov->force_writemask_all = inst->force_writemask_all;
> + mov->saturate = inst->saturate;
> + inst->insert_before(block, mov);
> + } else {
> + /* Platform doesn't have COMPR4. We have to fake it */
> + fs_reg mov_dst = retype(dst, inst->src[i].type);
> + mov_dst.width = 8;
> +
> + fs_inst *mov = MOV(mov_dst, half(inst->src[i], 0));
> + mov->force_writemask_all = inst->force_writemask_all;
> + mov->saturate = inst->saturate;
> + inst->insert_before(block, mov);
> +
> + mov = MOV(offset(mov_dst, 4), half(inst->src[i], 1));
> + mov->force_writemask_all = inst->force_writemask_all;
> + mov->saturate = inst->saturate;
> + mov->force_sechalf = true;
> + inst->insert_before(block, mov);
> }
> }
>
> - inst->insert_before(block, mov);
> + dst.reg++;
> }
>
> + dst.reg += 4;
> +
> + /* The COMPR4 code took care of the first 4 sources. We'll let
> + * the regular path handle any remaining sources. Yes, we are
> + * modifying the instruction but we're about to delete it so
> + * this really doesn't hurt anything.
> + */
> + inst->header_size += 4;
I mean, while the comment is a true statement, why is doing this any
better than just...
> + }
> +
> + for (uint8_t i = inst->header_size; i < inst->sources; i++) {
... changing this to inst->header_size + 4?
Seems like if we get to replace a four line comment (and some actual
code) with '+ 4' that's better.
> + if (inst->src[i].file != BAD_FILE) {
> + fs_inst *mov = MOV(retype(dst, inst->src[i].type),
> + inst->src[i]);
> + mov->force_writemask_all = inst->force_writemask_all;
> + mov->saturate = inst->saturate;
> + inst->insert_before(block, mov);
> + }
> dst = offset(dst, 1);
Here and above you could move the dst = ... into the for-loop
increment step, which would allow you to change the if statement to if
(BAD_FILE) continue and unindent the block.
No preference.
> }
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index b7eeb47..a0d8b798 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -3471,9 +3471,6 @@ fs_visitor::setup_color_payload(fs_reg *dst, fs_reg color, unsigned components,
> color.type, color.width);
> inst = emit(MOV(dst[len], offset(color, i)));
> inst->saturate = key->clamp_fragment_color;
> - } else if (color.width == 16) {
> - /* We need two BAD_FILE slots for a 16-wide color */
> - len++;
> }
> len++;
> }
> --
> 2.3.4
More information about the mesa-dev
mailing list