[Mesa-dev] [RFC] i965: Factor out descriptor building for indirect send messages

Pohjolainen, Topi topi.pohjolainen at intel.com
Wed Mar 11 10:34:49 PDT 2015


On Tue, Mar 10, 2015 at 11:07:26PM +0200, Francisco Jerez wrote:
> "Pohjolainen, Topi" <topi.pohjolainen at intel.com> writes:
> 
> > On Mon, Mar 09, 2015 at 12:43:08PM +0200, Francisco Jerez wrote:
> >> "Pohjolainen, Topi" <topi.pohjolainen at intel.com> writes:
> >> 
> >> > On Sat, Mar 07, 2015 at 04:15:08PM +0200, Francisco Jerez wrote:
> >> >> Topi Pohjolainen <topi.pohjolainen at intel.com> writes:
> >> >> 
> >> >> > The original patch from Curro was based on something that is not
> >> >> > present in the master yet. This patch tries to mimick the logic on
> >> >> > top master.
> >> >> > I wanted to see if could separate the building of the descriptor
> >> >> > instruction from building of the send instruction. This logic now
> >> >> > allows the caller to construct any kind of sequence of instructions
> >> >> > filling in the descriptor before giving it to the send instruction
> >> >> > builder.
> >> >> >
> >> >> > This is only compile tested. Curro, how would you feel about this
> >> >> > sort of approach? I apologise for muddying the waters again but I
> >> >> > wasn't entirely comfortable with the logic and wanted to try to
> >> >> > something else.
> >> >> >
> >> >> > I believe patch number five should go nicely on top of this as
> >> >> > the descriptor instruction could be followed by (or preceeeded by)
> >> >> > any additional instructions modifying the descriptor register
> >> >> > before the actual send instruction.
> >> >> >
> >> >> 
> >> >> Topi, the goal I had in mind with PATCH 01 was to refactor a commonly
> >> >> recurring pattern.  In terms of the functions defined in this patch my
> >> >> example from yesterday [1] would now look like:
> >> >> 
> >> >> |   if (index.file == BRW_IMMEDIATE_VALUE) {
> >> >> |
> >> >> |      uint32_t surf_index = index.dw1.ud;
> >> >> |
> >> >> |      brw_inst *send = brw_next_insn(p, BRW_OPCODE_SEND);
> >> >> |      brw_set_dest(p, send, retype(dst, BRW_REGISTER_TYPE_UW));
> >> >> |      brw_set_src0(p, send, offset);
> >> >> |      brw_set_sampler_message(p, send,
> >> >> |                              surf_index,
> >> >> |                              0, /* LD message ignores sampler unit */
> >> >> |                              GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
> >> >> |                              rlen,
> >> >> |                              mlen,
> >> >> |                              false, /* no header */
> >> >> |                              simd_mode,
> >> >> |                              0);
> >> >> |
> >> >> |      brw_mark_surface_used(prog_data, surf_index);
> >> >> |
> >> >> |   } else {
> >> >> |
> >> >> |      struct brw_reg addr = vec1(retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD));
> >> >> |
> >> >> |      brw_push_insn_state(p);
> >> >> |      brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> >> >> |      brw_set_default_access_mode(p, BRW_ALIGN_1);
> >> >> |
> >> >> |      /* a0.0 = surf_index & 0xff */
> >> >> |      brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
> >> >> |      brw_inst_set_exec_size(p->brw, insn_and, BRW_EXECUTE_1);
> >> >> |      brw_set_dest(p, insn_and, addr);
> >> >> |      brw_set_src0(p, insn_and, vec1(retype(index, BRW_REGISTER_TYPE_UD)));
> >> >> |      brw_set_src1(p, insn_and, brw_imm_ud(0x0ff));
> >> >> |
> >> >> |
> >> >> |      /* a0.0 |= <descriptor> */
> >> >> |      brw_inst *descr_inst = brw_build_indirect_message_descr(p, addr, addr);
> >> >> |      brw_set_sampler_message(p, descr_inst,
> >> >> |                              0 /* surface */,
> >> >> |                              0 /* sampler */,
> >> >> |                              GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
> >> >> |                              rlen /* rlen */,
> >> >> |                              mlen /* mlen */,
> >> >> |                              false /* header */,
> >> >> |                              simd_mode,
> >> >> |                              0);
> >> >> |
> >> >> |      /* dst = send(offset, a0.0) */
> >> >> |      brw_send_indirect_message(p, BRW_SFID_SAMPLER, dst, offset, addr);
> >> >> |
> >> >> |      brw_pop_insn_state(p);
> >> >> |
> >> >> |      /* visitor knows more than we do about the surface limit required,
> >> >> |       * so has already done marking.
> >> >> |       */
> >> >> |   }
> >> >
> >> > Which I think could also be written as follows. Or am I missing something
> >> > again?
> >> >
> >> > static brw_inst *
> >> > brw_build_surface_index_descr(struct brw_compile *p,
> >> >                               struct brw_reg dst, index)
> >> > {
> >> >    brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> >> >    brw_set_default_access_mode(p, BRW_ALIGN_1);
> >> >
> >> >    /* a0.0 = surf_index & 0xff */
> >> >    brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
> >> >    brw_inst_set_exec_size(p->brw, insn_and, BRW_EXECUTE_1);
> >> >    brw_set_dest(p, insn_and, addr);
> >> >    brw_set_src0(p, insn_and, vec1(retype(index, BRW_REGISTER_TYPE_UD)));
> >> >    brw_set_src1(p, insn_and, brw_imm_ud(0x0ff));
> >> >
> >> >    /* a0.0 |= <descriptor> */
> >> >    brw_inst *descr_inst = brw_build_indirect_message_descr(p, addr, addr);
> >> > }
> >> >
> >> > ...
> >> >    brw_inst *descr_inst;
> >> >    if (index.file == BRW_IMMEDIATE_VALUE) {
> >> >       descr = brw_next_insn(p, BRW_OPCODE_SEND);
> >> >       brw_set_dest(p, send, retype(dst, BRW_REGISTER_TYPE_UW));
> >> >       brw_set_src0(p, send, offset);
> >> >
> >> >       brw_mark_surface_used(prog_data, surf_index);
> >> >    } else {
> >> >       struct brw_reg addr = vec1(retype(brw_address_reg(0),
> >> >                                  BRW_REGISTER_TYPE_UD));
> >> >       brw_push_insn_state(p);
> >> >
> >> >       brw_build_surface_index_descr(p, addr, index);
> >> >       /* dst = send(offset, a0.0) */
> >> >       descr_inst = brw_send_indirect_message(p, BRW_SFID_SAMPLER,
> >> >                                              dst, offset, addr);
> >> >       brw_pop_insn_state(p);
> >> >    }
> >> >
> >> >    uint32_t surf_index = index.file == BRW_IMMEDIATE_VALUE ? index.dw1.ud : 0;
> >> >    brw_set_sampler_message(p, descr_inst,
> >> >                            surf_index,
> >> >                            0, /* LD message ignores sampler unit */
> >> >                            GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
> >> >                            rlen,
> >> >                            mlen,
> >> >                            false, /* no header */
> >> >                            simd_mode,
> >> >                            0);
> >> > ...
> >> 
> >> Yeah, something like that.  In any case you need logic to handle both
> >> the immediate and indirect cases explicitly by every caller, which seems
> >> like plenty of duplication to me, not sure what you are trying to
> >> achieve with this.
> >
> > Well, to me it seems that cases where address is an immediate and 
> > non-immediate already have quite fixed paths. I'm uneasy about changing the
> > final kick-off of the message sending to have two alternatives for these
> > - in each path only one final branch is really possible but this isn't that
> > obvious to the reader anymore as one needs to go and analyze the callers:
> >
> 
> No, that's not right, *all* users of brw_send_indirect_message() or
> brw_send_indirect_surface_message() will eventually have to handle
> *both* immediate and indirect descriptor cases, and the necessary logic
> is identical for *all* of them.

Some rational in the commit message would have helped me to understand the
ultimate goal, same as with generate_tex().
I'll continue with patch number five again. There are quite few things moving
around making it difficult to double check.

Also, this series is not based on master but something else. What is your
plan on landing this?

> 
> > Patch 1:
> >   - introduces brw_send_indirect_message() considering if address is
> >     immediate or not.
> >   - called by generate_tex() against address that is never an immediate
> 
> This is only temporary, I have a (not yet sent for review) patch that
> gets rid of explicit handling of immediate vs. indirect descriptors in
> the texturing and pull constant code completely.
> 
> >   - called by generate_uniform_pull_constant_load_gen7() in a branch which
> >     explicitly uses an immediate and another which explicitly uses
> >     non-immediate
> >
> I don't think that any immediate-only branch of
> generate_uniform_pull_constant_load_gen7() is calling
> brw_send_indirect_message() at this point.
> 
> > Patch 5:
> >   - introduces brw_send_indirect_surface_message() considering if address is
> >     immediate or not.
> >   - called by brw_untyped_atomic() and brw_untyped_surface_read() which
> >     both are only called against immediates
> >
> > Patch 11:
> >   - introduces brw_untyped_surface_write() which is only called against
> >     immediate
> >
> > Patch 12:
> >   - introduces brw_typed_atomic() which is only called against immediate
> >   - introduces brw_typed_surface_read() which is only called against immediate
> >   - introduces brw_typed_surface_write() which is only called against immediate
> >
> Typed and untyped surface opcodes are *all* called with both immediate
> and indirect surfaces.
> 
> >
> >
> > Then, in general I was hoping to have design pattern of this sort. Caller:
> >
> >    create_descriptor();
> >    add_property_1_to_descriptor();
> >    add_property_2_to_descriptor();
> >    ...
> >    send()
> >
> 
> I guess I find this acceptable usually, but sticking to this model in
> this case prevents abstraction, because the shared logic is only in the
> first and last statements of your example.

The property additions are shared as well, property_1 being the brw_OR()
in brw_send_indirect_message().

> 
> >
> > Instead of:
> >
> >   create_descriptor_and_send()
> >   {
> >      create();
> >      send();
> >   }
> >
> >   create_descriptor_with_1_and_send()
> >   {
> >      create_descriptor_and_send();
> >      add_property_1();
> >   }
> >
> 
> If create_descriptor_with_1_and_send() is pseudocode for
> brw_send_indirect_surface_message(), the only reason why it's there is
> to emit the AND instruction that masks out invalid bits from the surface
> index, which could otherwise lead to hangs.  I'm not planning on adding
> more nesting levels.

By that I meant brw_send_indirect_message() and
brw_send_indirect_surface_message() being create_descriptor_with_1_and_send().


More information about the mesa-dev mailing list