[Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
Jason Ekstrand
jason at jlekstrand.net
Thu Apr 2 06:57:06 PDT 2015
On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Jason Ekstrand <jason at jlekstrand.net> writes:
>
>> Instead of the complicated and broken-by-design pile of heuristics we had
>> before, we now have a straightforward lowering:
>>
>> 1) All header sources are copied directly using force_writemask_all and,
>> since they are guaranteed to be a single register, there are no
>> force_sechalf issues.
>>
>> 2) All non-header sources are copied using the exact same force_sechalf
>> and saturate modifiers as the LOAD_PAYLOAD operation itself.
>>
>> 3) In order to accommodate older gens that need interleaved colors,
>> lower_load_payload detects when the destination is a COMPR4 register
>> and automatically interleaves the non-header sources. The
>> lower_load_payload pass does the right thing here regardless of whether
>> or not the hardware actually supports COMPR4.
>
> I had a quick glance at the series and it seems to be going in the right
> direction.
>
> One thing I honestly don't like is the ad-hoc and IMHO premature
> treatment of payload headers, it still feels like the LOAD_PAYLOAD
> instruction has more complex semantics than necessary and the benefit is
> unclear to me.
>
> I suppose that your motivation was to avoid setting force_writemask_all
> in LOAD_PAYLOAD instructions with header. The optimizer should be able
> to cope with those flags and get rid of them from the resulting moves
> where they are redundant, and if it's not able to it looks like
> something that should be fixed anyway. The explicit handling of headers
> is responsible for much of the churn in this series and is likely to
> complicate users of LOAD_PAYLOAD and optimization passes that have to
> manipulate them.
Avoiding force_writemask_all is only half of the motivation and the
small half at that. A header source, more properly defined, is a
single physical register that, conceptually, applies to all channels.
Effectively, a header source (I should have stated this clearly) has
two properties:
1) It has force_writemask_all set
2) It is exactly one physical hardware register.
This second property is the more important of the two. Most of the
disaster of the previous LOAD_PAYLOAD implementation was that we did a
pile of guesswork and had a ill-conceved "effective width" think in
order to figure out how big the register actually was. Making the
user specify which sources are header sources eliminates that
guesswork. It also has the nice side-effect that we can do the right
force_writemask_all and we can properly handle COMPR4 for the the
user.
--Jason
> Thanks for looking into this BTW.
>
>> ---
>> src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++++++++++++++-------------
>> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 -
>> 2 files changed, 80 insertions(+), 77 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index f8e26c0..bc45a38 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload()
>> {
>> bool progress = false;
>>
>> - int vgrf_to_reg[alloc.count];
>> - int reg_count = 0;
>> - for (unsigned i = 0; i < alloc.count; ++i) {
>> - vgrf_to_reg[i] = reg_count;
>> - reg_count += alloc.sizes[i];
>> - }
>> -
>> - struct {
>> - bool written:1; /* Whether this register has ever been written */
>> - bool force_writemask_all:1;
>> - bool force_sechalf:1;
>> - } metadata[reg_count];
>> - memset(metadata, 0, sizeof(metadata));
>> -
>> foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
>> - if (inst->dst.file == GRF) {
>> - const int dst_reg = vgrf_to_reg[inst->dst.reg] + inst->dst.reg_offset;
>> - bool force_sechalf = inst->force_sechalf &&
>> - !inst->force_writemask_all;
>> - bool toggle_sechalf = inst->dst.width == 16 &&
>> - type_sz(inst->dst.type) == 4 &&
>> - !inst->force_writemask_all;
>> - for (int i = 0; i < inst->regs_written; ++i) {
>> - metadata[dst_reg + i].written = true;
>> - metadata[dst_reg + i].force_sechalf = force_sechalf;
>> - metadata[dst_reg + i].force_writemask_all = inst->force_writemask_all;
>> - force_sechalf = (toggle_sechalf != force_sechalf);
>> - }
>> - }
>> -
>> if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) {
>> assert(inst->dst.file == MRF || inst->dst.file == GRF);
>> +
>> fs_reg dst = inst->dst;
>>
>> - for (int i = 0; i < inst->sources; i++) {
>> - dst.width = inst->src[i].effective_width;
>> - dst.type = inst->src[i].type;
>> -
>> - if (inst->src[i].file == BAD_FILE) {
>> - /* Do nothing but otherwise increment as normal */
>> - } else if (dst.file == MRF &&
>> - dst.width == 8 &&
>> - brw->has_compr4 &&
>> - i + 4 < inst->sources &&
>> - inst->src[i + 4].equals(horiz_offset(inst->src[i], 8))) {
>> - fs_reg compr4_dst = dst;
>> - compr4_dst.reg += BRW_MRF_COMPR4;
>> - compr4_dst.width = 16;
>> - fs_reg compr4_src = inst->src[i];
>> - compr4_src.width = 16;
>> - fs_inst *mov = MOV(compr4_dst, compr4_src);
>> + /* Get rid of COMPR4. We'll add it back in if we need it */
>> + if (dst.file == MRF && dst.reg & BRW_MRF_COMPR4)
>> + dst.reg = dst.reg & ~BRW_MRF_COMPR4;
>> +
>> + dst.width = 8;
>> + for (uint8_t i = 0; i < inst->header_size; i++) {
>> + if (inst->src[i].file != BAD_FILE) {
>> + fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD);
>> + fs_reg mov_src = retype(inst->src[i], BRW_REGISTER_TYPE_UD);
>> + mov_src.width = 8;
>> + fs_inst *mov = MOV(mov_dst, mov_src);
>> mov->force_writemask_all = true;
>> inst->insert_before(block, mov);
>> - /* Mark i+4 as BAD_FILE so we don't emit a MOV for it */
>> - inst->src[i + 4].file = BAD_FILE;
>> - } else {
>> - fs_inst *mov = MOV(dst, inst->src[i]);
>> - if (inst->src[i].file == GRF) {
>> - int src_reg = vgrf_to_reg[inst->src[i].reg] +
>> - inst->src[i].reg_offset;
>> - mov->force_sechalf = metadata[src_reg].force_sechalf;
>> - mov->force_writemask_all = metadata[src_reg].force_writemask_all;
>> - } else {
>> - /* We don't have any useful metadata for immediates or
>> - * uniforms. Assume that any of the channels of the
>> - * destination may be used.
>> - */
>> - assert(inst->src[i].file == IMM ||
>> - inst->src[i].file == UNIFORM);
>> - mov->force_writemask_all = true;
>> - }
>> + }
>> + dst = offset(dst, 1);
>> + }
>>
>> - if (dst.file == GRF) {
>> - const int dst_reg = vgrf_to_reg[dst.reg] + dst.reg_offset;
>> - const bool force_writemask = mov->force_writemask_all;
>> - metadata[dst_reg].force_writemask_all = force_writemask;
>> - metadata[dst_reg].force_sechalf = mov->force_sechalf;
>> - if (dst.width * type_sz(dst.type) > 32) {
>> - assert(!mov->force_sechalf);
>> - metadata[dst_reg + 1].force_writemask_all = force_writemask;
>> - metadata[dst_reg + 1].force_sechalf = !force_writemask;
>> + dst.width = inst->exec_size;
>> + if (inst->dst.file == MRF && (inst->dst.reg & BRW_MRF_COMPR4) &&
>> + inst->exec_size > 8) {
>> + /* In this case, the payload portion of the LOAD_PAYLOAD isn't
>> + * a straightforward copy. Instead, the result of the
>> + * LOAD_PAYLOAD is treated as interlaced and the first four
>> + * non-header sources are unpacked as:
>> + *
>> + * m + 0: r0
>> + * m + 1: g0
>> + * m + 2: b0
>> + * m + 3: a0
>> + * m + 4: r1
>> + * m + 5: g1
>> + * m + 6: b1
>> + * m + 7: a1
>> + *
>> + * This is used for gen <= 5 fb writes.
>> + */
>> + assert(inst->exec_size == 16);
>> + assert(inst->header_size + 4 <= inst->sources);
>> + for (uint8_t i = inst->header_size; i < inst->header_size + 4; i++) {
>> + if (inst->src[i].file != BAD_FILE) {
>> + if (brw->has_compr4) {
>> + fs_reg compr4_dst = retype(dst, inst->src[i].type);
>> + compr4_dst.reg |= BRW_MRF_COMPR4;
>> +
>> + fs_inst *mov = MOV(compr4_dst, inst->src[i]);
>> + mov->force_writemask_all = inst->force_writemask_all;
>> + mov->saturate = inst->saturate;
>> + inst->insert_before(block, mov);
>> + } else {
>> + /* Platform doesn't have COMPR4. We have to fake it */
>> + fs_reg mov_dst = retype(dst, inst->src[i].type);
>> + mov_dst.width = 8;
>> +
>> + fs_inst *mov = MOV(mov_dst, half(inst->src[i], 0));
>> + mov->force_writemask_all = inst->force_writemask_all;
>> + mov->saturate = inst->saturate;
>> + inst->insert_before(block, mov);
>> +
>> + mov = MOV(offset(mov_dst, 4), half(inst->src[i], 1));
>> + mov->force_writemask_all = inst->force_writemask_all;
>> + mov->saturate = inst->saturate;
>> + mov->force_sechalf = true;
>> + inst->insert_before(block, mov);
>> }
>> }
>>
>> - inst->insert_before(block, mov);
>> + dst.reg++;
>> }
>>
>> + dst.reg += 4;
>> +
>> + /* The COMPR4 code took care of the first 4 sources. We'll let
>> + * the regular path handle any remaining sources. Yes, we are
>> + * modifying the instruction but we're about to delete it so
>> + * this really doesn't hurt anything.
>> + */
>> + inst->header_size += 4;
>> + }
>> +
>> + for (uint8_t i = inst->header_size; i < inst->sources; i++) {
>> + if (inst->src[i].file != BAD_FILE) {
>> + fs_inst *mov = MOV(retype(dst, inst->src[i].type),
>> + inst->src[i]);
>> + mov->force_writemask_all = inst->force_writemask_all;
>> + mov->saturate = inst->saturate;
>> + inst->insert_before(block, mov);
>> + }
>> dst = offset(dst, 1);
>> }
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> index b7eeb47..a0d8b798 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> @@ -3471,9 +3471,6 @@ fs_visitor::setup_color_payload(fs_reg *dst, fs_reg color, unsigned components,
>> color.type, color.width);
>> inst = emit(MOV(dst[len], offset(color, i)));
>> inst->saturate = key->clamp_fragment_color;
>> - } else if (color.width == 16) {
>> - /* We need two BAD_FILE slots for a 16-wide color */
>> - len++;
>> }
>> len++;
>> }
>> --
>> 2.3.4
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list