[Mesa-dev] [PATCH v3 07/24] i965/fs: generalize the legalization d2x pass

Mon Mar 6 12:03:13 UTC 2017

On 04/03/17 00:19, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
>> Add support to SEL instruction and add an assert to detect unsupported
>> instructions than do d2x conversions.
>>
>> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>> ---
>>
>> Curro, this patch legalizes SEL instruction too. If other optimizations
>> modify later any SEL's (or any other instruction's) destination type
>> (hence, producing a non-lowered d2x conversion), we can call it again
>> around the end of fs_visitor::optimize(). Possibly together with
>> lower_simd_width() just in case it was added later.
>>
> 
> This sounds rather scary...  How do you make sure that this doesn't lead
> to an infinite legalization-optimization loop in which copy propagation
> reverses the effect of lower_d2x making double conversions illegal
> again?  If you do already, why do you need to run lower_d2x multiple
> times?  Wouldn't it be sufficient to run it once near the end of
> optimize(), and then re-run copy propagation and possibly DCE?
> 

Right. I am going to change where we call it.

>> For that reason there is the inst->dst.stride > 1 condition in the
>> test. This detects if either we emitted a strided destination in
>> purpose or it was as a result of a previous lower_d2x run, we don't
>> want to lowered it.
>>
> The problem with this is that if you ended up with dst.stride > 1 due to
> different fields of the same scalar quantity being defined by two
> separate instructions (e.g. by using subscript(dst, ..., i)), you *need*
> to apply the lowering pass regardless, because otherwise the second
> instruction will corrupt the data written by the first instruction.
> 
>> However, as I have not hit that case yet, I prefer to wait for your
>> opinion. What do you think?
>>
>>
>>  src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp | 57 ++++++++++++++++++--------
>>  1 file changed, 41 insertions(+), 16 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>> index a2db1154615..330f2552929 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>> @@ -33,17 +33,9 @@ fs_visitor::lower_d2x()
>>     bool progress = false;
>>  
>>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>> -      if (inst->opcode != BRW_OPCODE_MOV)
>> -         continue;
>> -
>> -      if (inst->dst.type != BRW_REGISTER_TYPE_F &&
>> -          inst->dst.type != BRW_REGISTER_TYPE_D &&
>> -          inst->dst.type != BRW_REGISTER_TYPE_UD)
>> -         continue;
>> -
>> -      if (inst->src[0].type != BRW_REGISTER_TYPE_DF &&
>> -          inst->src[0].type != BRW_REGISTER_TYPE_UQ &&
>> -          inst->src[0].type != BRW_REGISTER_TYPE_Q)
>> +      if (get_exec_type_size(inst) != 8 ||
>> +          type_sz(inst->dst.type) >= get_exec_type_size(inst) ||
> 
> Note that some type conversion restrictions apply even if the execution
> type is single-precision, and even if the destination type size is not
> less than the execution type, e.g. according to the hardware docs SEL
> doesn't support F->UD or F->DF conversions which the condition above
> would consider okay.
> 

Right.

>> +          inst->dst.stride > 1)
>>           continue;
>>  
>>        assert(inst->dst.file == VGRF);
>> @@ -61,13 +53,46 @@ fs_visitor::lower_d2x()
>>         * So we need to allocate a temporary that's two registers, and then do
>>         * a strided MOV to get the lower DWord of every Qword that has the
>>         * result.
>> +       *
>> +       * This pass legalizes all the DF conversions to narrower types.
>>         */
>> -      fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>> -      fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>> -      ibld.MOV(strided_temp, inst->src[0]);
>> -      ibld.MOV(dst, strided_temp);
>> +      switch (inst->opcode) {
> 
> I suggest you refactor this into a helper function 'bool
> supports_type_conversion(inst, dst_type, exec_type)' that returns false
> for SEL and likely other things.  It might be a useful thing to have in
> other places, e.g. for late optimization passes like copy propagation
> where we need to make sure that no additional illegal conversions are
> introduced.  If the value returned is false you'd do what you have below
> for the SEL instruction, if it's true you'd do nothing unless the
> instruction is double-precision and the destination type is smaller than
> the execution type, in which case you'd do what you have below for
> MOV/MOV_INDIRECT.
> 

OK. I'm going to add it as static in the file, so we can promote it to a
header file when it is used within other optimizations/lowerings.

>> +      case SHADER_OPCODE_MOV_INDIRECT:
>> +      case BRW_OPCODE_MOV: {
>> +         fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>> +         fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>> +         /* We clone the original instruction as we are going to modify it
>> +          * and emit another one after it.
>> +          */
>> +         fs_inst *strided_inst = new(ibld.shader->mem_ctx) fs_inst(*inst);
> 
> Why don't you just modify the original instruction instead of cloning
> it, modifying the clone, and then removing the original?
> 

Right but I did it because I need to emit the strided MOV after the
instructions... which means adding support for this in fs_builder.

As you prefer to reuse the original instruction, then I will send a
patch implementing emit_after() functions in fs_builder and use them here.

>> +         strided_inst->dst = strided_temp;
>> +         /* As it is an strided destination, we write n-times more
>> +          * being n the size difference between source and destination types.
>> +          */
>> +         strided_inst->size_written *= (type_sz(inst->src[0].type) / type_sz(inst->dst.type));
>> +         ibld.emit(strided_inst);
>> +         ibld.MOV(dst, strided_temp);
>> +         /* Remove original instruction, now that is superseded. */
>> +         inst->remove(block);
>> +         break;
>> +      }
>> +      case BRW_OPCODE_SEL: {
>> +         fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1);
>> +         fs_reg strided_temp0 = subscript(temp0, inst->dst.type, 0);
>> +         fs_reg temp1 = ibld.vgrf(inst->src[1].type, 1);
>> +         fs_reg strided_temp1 = subscript(temp1, inst->dst.type, 0);
>>  
>> -      inst->remove(block);
>> +         /* Let's convert the operands to the destination type first */
>> +         ibld.MOV(strided_temp0, inst->src[0]);
>> +         ibld.MOV(strided_temp1, inst->src[1]);
>> +         inst->src[0] = strided_temp0;
>> +         inst->src[1] = strided_temp1;
> 
> Why don't you change the destination type to match the execution type so
> you only need to emit one additional converting move instead of two when
> there is a type mismatch?
> 

This was to avoid defining emit_after() in fs_builder. But as you prefer
that, then I will do it.

Thanks for the review! :)

Sam

>> +         break;
>> +      }
>> +      default:
>> +         /* FIXME: If this is not a supported instruction, then we need to support it. */
>> +         unreachable("Not supported instruction");
>> +      }
>>        progress = true;
>>     }
>>  
>> -- 
>> 2.11.0

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 862 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170306/fb38d201/attachment.sig>