[Mesa-dev] [RFC] i965: Factor out descriptor building for indirect send messages
Francisco Jerez
currojerez at riseup.net
Thu Mar 12 06:13:48 PDT 2015
"Pohjolainen, Topi" <topi.pohjolainen at intel.com> writes:
> On Tue, Mar 10, 2015 at 11:07:26PM +0200, Francisco Jerez wrote:
>> "Pohjolainen, Topi" <topi.pohjolainen at intel.com> writes:
>>
>> > On Mon, Mar 09, 2015 at 12:43:08PM +0200, Francisco Jerez wrote:
>> >> "Pohjolainen, Topi" <topi.pohjolainen at intel.com> writes:
>> >>
>> >> > On Sat, Mar 07, 2015 at 04:15:08PM +0200, Francisco Jerez wrote:
>> >> >> Topi Pohjolainen <topi.pohjolainen at intel.com> writes:
>> >> >>
>> >> >> > The original patch from Curro was based on something that is not
>> >> >> > present in the master yet. This patch tries to mimick the logic on
>> >> >> > top master.
>> >> >> > I wanted to see if could separate the building of the descriptor
>> >> >> > instruction from building of the send instruction. This logic now
>> >> >> > allows the caller to construct any kind of sequence of instructions
>> >> >> > filling in the descriptor before giving it to the send instruction
>> >> >> > builder.
>> >> >> >
>> >> >> > This is only compile tested. Curro, how would you feel about this
>> >> >> > sort of approach? I apologise for muddying the waters again but I
>> >> >> > wasn't entirely comfortable with the logic and wanted to try to
>> >> >> > something else.
>> >> >> >
>> >> >> > I believe patch number five should go nicely on top of this as
>> >> >> > the descriptor instruction could be followed by (or preceeeded by)
>> >> >> > any additional instructions modifying the descriptor register
>> >> >> > before the actual send instruction.
>> >> >> >
>> >> >>
>> >> >> Topi, the goal I had in mind with PATCH 01 was to refactor a commonly
>> >> >> recurring pattern. In terms of the functions defined in this patch my
>> >> >> example from yesterday [1] would now look like:
>> >> >>
>> >> >> | if (index.file == BRW_IMMEDIATE_VALUE) {
>> >> >> |
>> >> >> | uint32_t surf_index = index.dw1.ud;
>> >> >> |
>> >> >> | brw_inst *send = brw_next_insn(p, BRW_OPCODE_SEND);
>> >> >> | brw_set_dest(p, send, retype(dst, BRW_REGISTER_TYPE_UW));
>> >> >> | brw_set_src0(p, send, offset);
>> >> >> | brw_set_sampler_message(p, send,
>> >> >> | surf_index,
>> >> >> | 0, /* LD message ignores sampler unit */
>> >> >> | GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
>> >> >> | rlen,
>> >> >> | mlen,
>> >> >> | false, /* no header */
>> >> >> | simd_mode,
>> >> >> | 0);
>> >> >> |
>> >> >> | brw_mark_surface_used(prog_data, surf_index);
>> >> >> |
>> >> >> | } else {
>> >> >> |
>> >> >> | struct brw_reg addr = vec1(retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD));
>> >> >> |
>> >> >> | brw_push_insn_state(p);
>> >> >> | brw_set_default_mask_control(p, BRW_MASK_DISABLE);
>> >> >> | brw_set_default_access_mode(p, BRW_ALIGN_1);
>> >> >> |
>> >> >> | /* a0.0 = surf_index & 0xff */
>> >> >> | brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
>> >> >> | brw_inst_set_exec_size(p->brw, insn_and, BRW_EXECUTE_1);
>> >> >> | brw_set_dest(p, insn_and, addr);
>> >> >> | brw_set_src0(p, insn_and, vec1(retype(index, BRW_REGISTER_TYPE_UD)));
>> >> >> | brw_set_src1(p, insn_and, brw_imm_ud(0x0ff));
>> >> >> |
>> >> >> |
>> >> >> | /* a0.0 |= <descriptor> */
>> >> >> | brw_inst *descr_inst = brw_build_indirect_message_descr(p, addr, addr);
>> >> >> | brw_set_sampler_message(p, descr_inst,
>> >> >> | 0 /* surface */,
>> >> >> | 0 /* sampler */,
>> >> >> | GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
>> >> >> | rlen /* rlen */,
>> >> >> | mlen /* mlen */,
>> >> >> | false /* header */,
>> >> >> | simd_mode,
>> >> >> | 0);
>> >> >> |
>> >> >> | /* dst = send(offset, a0.0) */
>> >> >> | brw_send_indirect_message(p, BRW_SFID_SAMPLER, dst, offset, addr);
>> >> >> |
>> >> >> | brw_pop_insn_state(p);
>> >> >> |
>> >> >> | /* visitor knows more than we do about the surface limit required,
>> >> >> | * so has already done marking.
>> >> >> | */
>> >> >> | }
>> >> >
>> >> > Which I think could also be written as follows. Or am I missing something
>> >> > again?
>> >> >
>> >> > static brw_inst *
>> >> > brw_build_surface_index_descr(struct brw_compile *p,
>> >> > struct brw_reg dst, index)
>> >> > {
>> >> > brw_set_default_mask_control(p, BRW_MASK_DISABLE);
>> >> > brw_set_default_access_mode(p, BRW_ALIGN_1);
>> >> >
>> >> > /* a0.0 = surf_index & 0xff */
>> >> > brw_inst *insn_and = brw_next_insn(p, BRW_OPCODE_AND);
>> >> > brw_inst_set_exec_size(p->brw, insn_and, BRW_EXECUTE_1);
>> >> > brw_set_dest(p, insn_and, addr);
>> >> > brw_set_src0(p, insn_and, vec1(retype(index, BRW_REGISTER_TYPE_UD)));
>> >> > brw_set_src1(p, insn_and, brw_imm_ud(0x0ff));
>> >> >
>> >> > /* a0.0 |= <descriptor> */
>> >> > brw_inst *descr_inst = brw_build_indirect_message_descr(p, addr, addr);
>> >> > }
>> >> >
>> >> > ...
>> >> > brw_inst *descr_inst;
>> >> > if (index.file == BRW_IMMEDIATE_VALUE) {
>> >> > descr = brw_next_insn(p, BRW_OPCODE_SEND);
>> >> > brw_set_dest(p, send, retype(dst, BRW_REGISTER_TYPE_UW));
>> >> > brw_set_src0(p, send, offset);
>> >> >
>> >> > brw_mark_surface_used(prog_data, surf_index);
>> >> > } else {
>> >> > struct brw_reg addr = vec1(retype(brw_address_reg(0),
>> >> > BRW_REGISTER_TYPE_UD));
>> >> > brw_push_insn_state(p);
>> >> >
>> >> > brw_build_surface_index_descr(p, addr, index);
>> >> > /* dst = send(offset, a0.0) */
>> >> > descr_inst = brw_send_indirect_message(p, BRW_SFID_SAMPLER,
>> >> > dst, offset, addr);
>> >> > brw_pop_insn_state(p);
>> >> > }
>> >> >
>> >> > uint32_t surf_index = index.file == BRW_IMMEDIATE_VALUE ? index.dw1.ud : 0;
>> >> > brw_set_sampler_message(p, descr_inst,
>> >> > surf_index,
>> >> > 0, /* LD message ignores sampler unit */
>> >> > GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
>> >> > rlen,
>> >> > mlen,
>> >> > false, /* no header */
>> >> > simd_mode,
>> >> > 0);
>> >> > ...
>> >>
>> >> Yeah, something like that. In any case you need logic to handle both
>> >> the immediate and indirect cases explicitly by every caller, which seems
>> >> like plenty of duplication to me, not sure what you are trying to
>> >> achieve with this.
>> >
>> > Well, to me it seems that cases where address is an immediate and
>> > non-immediate already have quite fixed paths. I'm uneasy about changing the
>> > final kick-off of the message sending to have two alternatives for these
>> > - in each path only one final branch is really possible but this isn't that
>> > obvious to the reader anymore as one needs to go and analyze the callers:
>> >
>>
>> No, that's not right, *all* users of brw_send_indirect_message() or
>> brw_send_indirect_surface_message() will eventually have to handle
>> *both* immediate and indirect descriptor cases, and the necessary logic
>> is identical for *all* of them.
>
> Some rational in the commit message would have helped me to understand the
> ultimate goal, same as with generate_tex().
> I'll continue with patch number five again. There are quite few things moving
> around making it difficult to double check.
>
OK, I'll write a short explanation in the commit message.
> Also, this series is not based on master but something else. What is your
> plan on landing this?
>
Right, it was based on my constant cache series, I've just rebased it on
top of master.
>>
>> > Patch 1:
>> > - introduces brw_send_indirect_message() considering if address is
>> > immediate or not.
>> > - called by generate_tex() against address that is never an immediate
>>
>> This is only temporary, I have a (not yet sent for review) patch that
>> gets rid of explicit handling of immediate vs. indirect descriptors in
>> the texturing and pull constant code completely.
>>
>> > - called by generate_uniform_pull_constant_load_gen7() in a branch which
>> > explicitly uses an immediate and another which explicitly uses
>> > non-immediate
>> >
>> I don't think that any immediate-only branch of
>> generate_uniform_pull_constant_load_gen7() is calling
>> brw_send_indirect_message() at this point.
>>
>> > Patch 5:
>> > - introduces brw_send_indirect_surface_message() considering if address is
>> > immediate or not.
>> > - called by brw_untyped_atomic() and brw_untyped_surface_read() which
>> > both are only called against immediates
>> >
>> > Patch 11:
>> > - introduces brw_untyped_surface_write() which is only called against
>> > immediate
>> >
>> > Patch 12:
>> > - introduces brw_typed_atomic() which is only called against immediate
>> > - introduces brw_typed_surface_read() which is only called against immediate
>> > - introduces brw_typed_surface_write() which is only called against immediate
>> >
>> Typed and untyped surface opcodes are *all* called with both immediate
>> and indirect surfaces.
>>
>> >
>> >
>> > Then, in general I was hoping to have design pattern of this sort. Caller:
>> >
>> > create_descriptor();
>> > add_property_1_to_descriptor();
>> > add_property_2_to_descriptor();
>> > ...
>> > send()
>> >
>>
>> I guess I find this acceptable usually, but sticking to this model in
>> this case prevents abstraction, because the shared logic is only in the
>> first and last statements of your example.
>
> The property additions are shared as well, property_1 being the brw_OR()
> in brw_send_indirect_message().
>
>>
>> >
>> > Instead of:
>> >
>> > create_descriptor_and_send()
>> > {
>> > create();
>> > send();
>> > }
>> >
>> > create_descriptor_with_1_and_send()
>> > {
>> > create_descriptor_and_send();
>> > add_property_1();
>> > }
>> >
>>
>> If create_descriptor_with_1_and_send() is pseudocode for
>> brw_send_indirect_surface_message(), the only reason why it's there is
>> to emit the AND instruction that masks out invalid bits from the surface
>> index, which could otherwise lead to hangs. I'm not planning on adding
>> more nesting levels.
>
> By that I meant brw_send_indirect_message() and
> brw_send_indirect_surface_message() being create_descriptor_with_1_and_send().
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150312/e0ed9d01/attachment-0001.sig>
More information about the mesa-dev
mailing list