[Mesa-dev] [PATCH v4 07/28] i965/fs: generalize the legalization d2x pass
Francisco Jerez
currojerez at riseup.net
Thu Mar 23 19:01:25 UTC 2017
Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> On Wed, 2017-03-22 at 15:47 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>>
>> > Generalize it to lower any unsupported narrower conversion.
>> >
>> > v2 (Curro):
>> > - Add supports_type_conversion()
>> > - Reuse existing intruction instead of cloning it.
>> > - Generalize d2x to narrower and equal size conversions.
>> >
>> > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>> > ---
>> > src/intel/compiler/brw_fs.cpp | 11 ++--
>> > src/intel/compiler/brw_fs_lower_d2x.cpp | 97
>> > ++++++++++++++++++++++++---------
>> > 2 files changed, 77 insertions(+), 31 deletions(-)
>> >
>> > diff --git a/src/intel/compiler/brw_fs.cpp
>> > b/src/intel/compiler/brw_fs.cpp
>> > index 6da23b17107..8612a83805d 100644
>> > --- a/src/intel/compiler/brw_fs.cpp
>> > +++ b/src/intel/compiler/brw_fs.cpp
>> > @@ -5694,11 +5694,6 @@ fs_visitor::optimize()
>> > OPT(dead_code_eliminate);
>> > }
>> >
>> > - if (OPT(lower_d2x)) {
>> > - OPT(opt_copy_propagation);
>> > - OPT(dead_code_eliminate);
>> > - }
>> > -
>> > OPT(lower_simd_width);
>> >
>> > /* After SIMD lowering just in case we had to unroll the EOT
>> > send. */
>> > @@ -5745,6 +5740,12 @@ fs_visitor::optimize()
>> > OPT(dead_code_eliminate);
>> > }
>> >
>> > + if (OPT(lower_d2x)) {
>> > + OPT(opt_copy_propagation);
>> > + OPT(dead_code_eliminate);
>> > + OPT(lower_simd_width);
>> > + }
>> > +
>> > lower_uniform_pull_constant_loads();
>> >
>> > validate();
>> > diff --git a/src/intel/compiler/brw_fs_lower_d2x.cpp
>> > b/src/intel/compiler/brw_fs_lower_d2x.cpp
>> > index a2db1154615..fa25d313dcd 100644
>> > --- a/src/intel/compiler/brw_fs_lower_d2x.cpp
>> > +++ b/src/intel/compiler/brw_fs_lower_d2x.cpp
>> > @@ -27,47 +27,92 @@
>> >
>> > using namespace brw;
>> >
>> > +static bool
>> > +supports_type_conversion(fs_inst *inst) {
>>
>> Pointer should be marked const.
>>
>> > + switch(inst->opcode) {
>> > + case BRW_OPCODE_MOV:
>> > + case SHADER_OPCODE_MOV_INDIRECT:
>> > + return true;
>> > + case BRW_OPCODE_SEL:
>> > + return false;
>>
>> I suggest you return 'inst->dst.type == get_exec_type_size(inst)'
>> here
>> in order to simplify the logic below and restructure things in a way
>> that allows you to make opcode-specific decisions here based on the
>> actual execution and destination types.
>
> I guess you meant:
> 'type_sz(inst->dst.type) == get_exec_type_size(inst)'
>
Uhm, not really, you really want to compare the destination type with
the execution type of the instruction.
> This would make it simpler but we then allow f2d conversion which is
> not allowed for SEL.
>
>>
>> > + default:
>> > + /* FIXME: We assume the opcodes don't explicitly mentioned
>> > + * before just work fine with arbitrary conversions.
>> > + */
>> > + return true;
>> > + }
>> > +}
>> > +
>> > bool
>> > fs_visitor::lower_d2x()
>> > {
>> > bool progress = false;
>> >
>> > foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>> > - if (inst->opcode != BRW_OPCODE_MOV)
>> > - continue;
>> > + bool inst_support_conversion =
>> > supports_type_conversion(inst);
>> > + bool supported_conversion =
>> > + inst_support_conversion &&
>> > + (get_exec_type_size(inst) != 8 ||
>> > + type_sz(inst->dst.type) > 4 ||
>> > + type_sz(inst->dst.type) >= get_exec_type_size(inst));
>> >
>> > - if (inst->dst.type != BRW_REGISTER_TYPE_F &&
>> > - inst->dst.type != BRW_REGISTER_TYPE_D &&
>> > - inst->dst.type != BRW_REGISTER_TYPE_UD)
>> > + /* If the conversion is supported or there is no conversion
>> > then
>> > + * do nothing.
>> > + */
>> > + if (supported_conversion ||
>> > + (!inst_support_conversion && inst->dst.type == inst-
>> > >src[0].type) ||
>>
>> You should be using the execution type here (and below where you
>> allocate the temp0 and temp1 vgrfs) instead of assuming it matches
>> the
>> type of the first source. Also I'm not 100% certain that the
>> combination of this conditional continue, the one below, the if/else
>> branches below and the supported_conversion definitions above catch
>> the
>> exact set of cases where you need to apply lowering. This would be a
>> lot easier to follow (most of the above goes away) if you did
>> something
>> along the lines of:
>>
>
> OK, thanks.
>
>> > if (supports_type_conversion(inst)) {
>> > if (get_exec_type_size(inst) == 8 && type_sz(inst->dst.type) <
>> > 8) {
>> > // Apply strided move workaround...
>> > progress = true;
>> > }
>> > } else {
>> > // Apply general conversion workaround...
>> > progress = true;
>> > }
>> > + inst->dst.file == BAD_FILE || inst->src[0].file ==
>> > BAD_FILE)
>>
>> I don't think the 'inst->dst.file == BAD_FILE || inst->src[0].file ==
>> BAD_FILE' terms of this expression are correct or useful, the
>> 'inst->src[0].file' term because there might be a destination region
>> conversion regardless of what the file of the first source is, the
>> 'inst->dst.file == BAD_FILE' term because you could just rely on DCE
>> instead...
>>
>
> OK
>
>> > continue;
>> >
>> > - if (inst->src[0].type != BRW_REGISTER_TYPE_DF &&
>> > - inst->src[0].type != BRW_REGISTER_TYPE_UQ &&
>> > - inst->src[0].type != BRW_REGISTER_TYPE_Q)
>> > - continue;
>> > + /* This pass only supports conversion to narrower or equal
>> > size types. */
>> > + if (get_exec_type_size(inst) < type_sz(inst->dst.type))
>> > + continue;
>> >
>>
>> I think you need to apply this condition to the if branch below that
>> deals with DF to F conversions *only*, the else branch you probably
>> need
>> to apply regardless of whether the destination type is larger or
>> smaller
>> than the execution type.
>>
>
> OK
>
>> > - assert(inst->dst.file == VGRF);
>> > assert(inst->saturate == false);
>>
>> I don't see why it's safe to rely on saturate being false here, in
>> any
>> case it would be straightforward to handle it by setting the flag on
>> the
>> emitted MOV instruction and clearing it on the original instruction.
>>
>
> OK
>
>> > - fs_reg dst = inst->dst;
>> >
>> > const fs_builder ibld(this, block, inst);
>> > + fs_reg dst = inst->dst;
>> >
>> > - /* From the Broadwell PRM, 3D Media GPGPU, "Double Precision
>> > Float to
>> > - * Single Precision Float":
>> > - *
>> > - * The upper Dword of every Qword will be written with
>> > undefined
>> > - * value when converting DF to F.
>> > - *
>> > - * So we need to allocate a temporary that's two registers,
>> > and then do
>> > - * a strided MOV to get the lower DWord of every Qword that
>> > has the
>> > - * result.
>> > - */
>> > - fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>> > - fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>> > - ibld.MOV(strided_temp, inst->src[0]);
>> > - ibld.MOV(dst, strided_temp);
>> > + if (inst_support_conversion && !supported_conversion) {
>> > + /* From the Broadwell PRM, 3D Media GPGPU, "Double
>> > Precision Float to
>> > + * Single Precision Float":
>> > + *
>> > + * The upper Dword of every Qword will be written with
>> > undefined
>> > + * value when converting DF to F.
>> > + *
>> > + * So we need to allocate a temporary that's two
>> > registers, and then do
>> > + * a strided MOV to get the lower DWord of every Qword
>> > that has the
>> > + * result.
>> > + */
>> > + fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>>
>> The '1' argument is redundant in all fs_builder::vgrf() calls here
>> and
>> below.
>>
>
> OK
>
>> > + fs_reg strided_temp = subscript(temp, dst.type, 0);
>> > +
>> > + /* We clone the original instruction as we are going to
>> > modify it
>> > + * and emit another one after it.
>> > + */
>>
>> Comment seems stale.
>>
>
> Right.
>
>> > + inst->dst = strided_temp;
>> > + /* As it is an strided destination, we write n-times more
>> > being n the
>> > + * size difference between source and destination types.
>> > Update
>>
>> I guess by difference you meant ratio here and in the copy-pasted
>> comment below (the comments also have a few minor spelling mistakes
>> but
>> that's unlikely to cause as much confusion).
>>
>
> Right.
>
>> > + * size_written with the new destination.
>> > + */
>> > + inst->size_written = inst->dst.component_size(inst-
>> > >exec_size);
>>
>> I think I'd add an assert(inst->size_written ==
>> inst->dst.component_size(inst->exec_size)) right before you assign
>> 'inst->dst' to point at the temporary in order to make sure you
>> aren't
>> accidentally miscompiling instructions that write multiple
>> components.
>>
>
> OK
>
>> > + ibld.at(block, inst->next).MOV(dst, strided_temp);
>> > + } else {
>> > + fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1);
>> > + fs_reg temp1 = ibld.vgrf(inst->src[0].type, 1);
>> > + fs_reg strided_temp1 = subscript(temp1, dst.type, 0);
>> > +
>> > + inst->dst = temp0;
>> > + /* As it is an strided destination, we write n-times more
>> > being n the
>> > + * size difference between source and destination types.
>> > Update
>> > + * size_written with the new destination.
>> > + */
>> > + inst->size_written = inst->dst.component_size(inst-
>> > >exec_size);
>> >
>> > - inst->remove(block);
>> > + /* Now, do the conversion to original destination's type.
>> > */
>> > + fs_inst *mov = ibld.at(block, inst-
>> > >next).MOV(strided_temp1, temp0);
>> > + ibld.at(block, mov->next).MOV(dst, strided_temp1);
>>
>> Isn't this MOV and the strided_temp1 handling redundant? I'd expect
>> it
>> to be handled automatically during the next loop iteration of this
>> pass
>> (though you may have to switch back to foreach_block_and_inst()
>> instead
>> of foreach_block_and_inst_safe()), which would also result in better
>> code because the extra move would only be emitted if this is an
>> actual
>> DF->F conversion.
>>
>
> Good catch!
>
> Thanks for all the suggestions, I am going to do them.
>
> Sam
>
>> > + }
>> > progress = true;
>> > }
>> >
>> > --
>> > 2.11.0
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170323/5b61da0f/attachment-0001.sig>
More information about the mesa-dev
mailing list