[Mesa-dev] [PATCH v3 07/24] i965/fs: generalize the legalization d2x pass

Tue Mar 14 10:17:38 UTC 2017

On 04/03/17 00:19, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
>> Add support to SEL instruction and add an assert to detect unsupported
>> instructions than do d2x conversions.
>>
>> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>> ---
>>
>> Curro, this patch legalizes SEL instruction too. If other optimizations
>> modify later any SEL's (or any other instruction's) destination type
>> (hence, producing a non-lowered d2x conversion), we can call it again
>> around the end of fs_visitor::optimize(). Possibly together with
>> lower_simd_width() just in case it was added later.
>>
> 
> This sounds rather scary...  How do you make sure that this doesn't lead
> to an infinite legalization-optimization loop in which copy propagation
> reverses the effect of lower_d2x making double conversions illegal
> again?  If you do already, why do you need to run lower_d2x multiple
> times?  Wouldn't it be sufficient to run it once near the end of
> optimize(), and then re-run copy propagation and possibly DCE?
> 
>> For that reason there is the inst->dst.stride > 1 condition in the
>> test. This detects if either we emitted a strided destination in
>> purpose or it was as a result of a previous lower_d2x run, we don't
>> want to lowered it.
>>
> The problem with this is that if you ended up with dst.stride > 1 due to
> different fields of the same scalar quantity being defined by two
> separate instructions (e.g. by using subscript(dst, ..., i)), you *need*
> to apply the lowering pass regardless, because otherwise the second
> instruction will corrupt the data written by the first instruction.
> 
>> However, as I have not hit that case yet, I prefer to wait for your
>> opinion. What do you think?
>>
>>
>>  src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp | 57 ++++++++++++++++++--------
>>  1 file changed, 41 insertions(+), 16 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>> index a2db1154615..330f2552929 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>> @@ -33,17 +33,9 @@ fs_visitor::lower_d2x()
>>     bool progress = false;
>>  
>>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>> -      if (inst->opcode != BRW_OPCODE_MOV)
>> -         continue;
>> -
>> -      if (inst->dst.type != BRW_REGISTER_TYPE_F &&
>> -          inst->dst.type != BRW_REGISTER_TYPE_D &&
>> -          inst->dst.type != BRW_REGISTER_TYPE_UD)
>> -         continue;
>> -
>> -      if (inst->src[0].type != BRW_REGISTER_TYPE_DF &&
>> -          inst->src[0].type != BRW_REGISTER_TYPE_UQ &&
>> -          inst->src[0].type != BRW_REGISTER_TYPE_Q)
>> +      if (get_exec_type_size(inst) != 8 ||
>> +          type_sz(inst->dst.type) >= get_exec_type_size(inst) ||
> 
> Note that some type conversion restrictions apply even if the execution
> type is single-precision, and even if the destination type size is not
> less than the execution type, e.g. according to the hardware docs SEL
> doesn't support F->UD or F->DF conversions which the condition above
> would consider okay.
> 
>> +          inst->dst.stride > 1)
>>           continue;
>>  
>>        assert(inst->dst.file == VGRF);
>> @@ -61,13 +53,46 @@ fs_visitor::lower_d2x()
>>         * So we need to allocate a temporary that's two registers, and then do
>>         * a strided MOV to get the lower DWord of every Qword that has the
>>         * result.
>> +       *
>> +       * This pass legalizes all the DF conversions to narrower types.
>>         */
>> -      fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>> -      fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>> -      ibld.MOV(strided_temp, inst->src[0]);
>> -      ibld.MOV(dst, strided_temp);
>> +      switch (inst->opcode) {
> 
> I suggest you refactor this into a helper function 'bool
> supports_type_conversion(inst, dst_type, exec_type)' that returns false
> for SEL and likely other things.  It might be a useful thing to have in
> other places, e.g. for late optimization passes like copy propagation
> where we need to make sure that no additional illegal conversions are
> introduced.  If the value returned is false you'd do what you have below
> for the SEL instruction, if it's true you'd do nothing unless the
> instruction is double-precision and the destination type is smaller than
> the execution type, in which case you'd do what you have below for
> MOV/MOV_INDIRECT.
> 

I have implemented the generalization of this pass following this
suggestion, however I needed to disallow it for some specific cases:

* Message related opcodes (samplers, urb writes, etc): they are not
actually conversions between source registers and destination, so we
cannot legalize them.
* Opcodes like ASR, SHR, SHL where we reinterpret the result of the
opcode, so no need to legalize it either.
* I also restricted this legalization to opcodes where the arguments
have all of them the same type. In some cases, we have mixed source
arguments than will be either fixed in the generator (i.e. BFE/BFI2) or
we don't know which of them use for the destination's data type in the
legalization. I can add a TODO comment if you want.
* This pass only legalizes conversions to narrower types. I have a patch
to rename this pass to 'lower_narrow_conversions' (suggestions for
better names are welcome).

This change is a little bit scary because if we legalize the wrong case
(for example a sampler message) we end up having regressions or, even
worse, GPU hangs. Also, depending of the generation, some tests start to
regress because some instructions are emitted slightly different.

I think I fixed all the GPU hangs in all the generations I have access
to and I don't see any regressions on Intel CI, but I am worried if this
change could bite us again in the future (games, applications, new
tests/opcodes, etc).

If you ask me, I prefer to keep this lowering pass as it is, i.e.
legalize only d2x conversions, to avoid blocking this series because of
this. Once this patch series lands in master, I can send an RFC patch
with this change and we could discuss how it can be done.

What do you think?

Sam

>> +      case SHADER_OPCODE_MOV_INDIRECT:
>> +      case BRW_OPCODE_MOV: {
>> +         fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>> +         fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>> +         /* We clone the original instruction as we are going to modify it
>> +          * and emit another one after it.
>> +          */
>> +         fs_inst *strided_inst = new(ibld.shader->mem_ctx) fs_inst(*inst);
> 
> Why don't you just modify the original instruction instead of cloning
> it, modifying the clone, and then removing the original?
> 
>> +         strided_inst->dst = strided_temp;
>> +         /* As it is an strided destination, we write n-times more
>> +          * being n the size difference between source and destination types.
>> +          */
>> +         strided_inst->size_written *= (type_sz(inst->src[0].type) / type_sz(inst->dst.type));
>> +         ibld.emit(strided_inst);
>> +         ibld.MOV(dst, strided_temp);
>> +         /* Remove original instruction, now that is superseded. */
>> +         inst->remove(block);
>> +         break;
>> +      }
>> +      case BRW_OPCODE_SEL: {
>> +         fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1);
>> +         fs_reg strided_temp0 = subscript(temp0, inst->dst.type, 0);
>> +         fs_reg temp1 = ibld.vgrf(inst->src[1].type, 1);
>> +         fs_reg strided_temp1 = subscript(temp1, inst->dst.type, 0);
>>  
>> -      inst->remove(block);
>> +         /* Let's convert the operands to the destination type first */
>> +         ibld.MOV(strided_temp0, inst->src[0]);
>> +         ibld.MOV(strided_temp1, inst->src[1]);
>> +         inst->src[0] = strided_temp0;
>> +         inst->src[1] = strided_temp1;
> 
> Why don't you change the destination type to match the execution type so
> you only need to emit one additional converting move instead of two when
> there is a type mismatch?
> 
>> +         break;
>> +      }
>> +      default:
>> +         /* FIXME: If this is not a supported instruction, then we need to support it. */
>> +         unreachable("Not supported instruction");
>> +      }
>>        progress = true;
>>     }
>>  
>> -- 
>> 2.11.0

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 862 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170314/9dbea2aa/attachment.sig>