[Mesa-dev] [PATCH v3 07/24] i965/fs: generalize the legalization d2x pass

Thu Mar 16 06:20:13 UTC 2017

On 16/03/17 04:23, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
>> On 04/03/17 00:19, Francisco Jerez wrote:
>>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>>>
>>>> Add support to SEL instruction and add an assert to detect unsupported
>>>> instructions than do d2x conversions.
>>>>
>>>> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>>>> ---
>>>>
>>>> Curro, this patch legalizes SEL instruction too. If other optimizations
>>>> modify later any SEL's (or any other instruction's) destination type
>>>> (hence, producing a non-lowered d2x conversion), we can call it again
>>>> around the end of fs_visitor::optimize(). Possibly together with
>>>> lower_simd_width() just in case it was added later.
>>>>
>>>
>>> This sounds rather scary...  How do you make sure that this doesn't lead
>>> to an infinite legalization-optimization loop in which copy propagation
>>> reverses the effect of lower_d2x making double conversions illegal
>>> again?  If you do already, why do you need to run lower_d2x multiple
>>> times?  Wouldn't it be sufficient to run it once near the end of
>>> optimize(), and then re-run copy propagation and possibly DCE?
>>>
>>>> For that reason there is the inst->dst.stride > 1 condition in the
>>>> test. This detects if either we emitted a strided destination in
>>>> purpose or it was as a result of a previous lower_d2x run, we don't
>>>> want to lowered it.
>>>>
>>> The problem with this is that if you ended up with dst.stride > 1 due to
>>> different fields of the same scalar quantity being defined by two
>>> separate instructions (e.g. by using subscript(dst, ..., i)), you *need*
>>> to apply the lowering pass regardless, because otherwise the second
>>> instruction will corrupt the data written by the first instruction.
>>>
>>>> However, as I have not hit that case yet, I prefer to wait for your
>>>> opinion. What do you think?
>>>>
>>>>
>>>>  src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp | 57 ++++++++++++++++++--------
>>>>  1 file changed, 41 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>>>> index a2db1154615..330f2552929 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>>>> @@ -33,17 +33,9 @@ fs_visitor::lower_d2x()
>>>>     bool progress = false;
>>>>  
>>>>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>>>> -      if (inst->opcode != BRW_OPCODE_MOV)
>>>> -         continue;
>>>> -
>>>> -      if (inst->dst.type != BRW_REGISTER_TYPE_F &&
>>>> -          inst->dst.type != BRW_REGISTER_TYPE_D &&
>>>> -          inst->dst.type != BRW_REGISTER_TYPE_UD)
>>>> -         continue;
>>>> -
>>>> -      if (inst->src[0].type != BRW_REGISTER_TYPE_DF &&
>>>> -          inst->src[0].type != BRW_REGISTER_TYPE_UQ &&
>>>> -          inst->src[0].type != BRW_REGISTER_TYPE_Q)
>>>> +      if (get_exec_type_size(inst) != 8 ||
>>>> +          type_sz(inst->dst.type) >= get_exec_type_size(inst) ||
>>>
>>> Note that some type conversion restrictions apply even if the execution
>>> type is single-precision, and even if the destination type size is not
>>> less than the execution type, e.g. according to the hardware docs SEL
>>> doesn't support F->UD or F->DF conversions which the condition above
>>> would consider okay.
>>>
>>>> +          inst->dst.stride > 1)
>>>>           continue;
>>>>  
>>>>        assert(inst->dst.file == VGRF);
>>>> @@ -61,13 +53,46 @@ fs_visitor::lower_d2x()
>>>>         * So we need to allocate a temporary that's two registers, and then do
>>>>         * a strided MOV to get the lower DWord of every Qword that has the
>>>>         * result.
>>>> +       *
>>>> +       * This pass legalizes all the DF conversions to narrower types.
>>>>         */
>>>> -      fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>>>> -      fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>>>> -      ibld.MOV(strided_temp, inst->src[0]);
>>>> -      ibld.MOV(dst, strided_temp);
>>>> +      switch (inst->opcode) {
>>>
>>> I suggest you refactor this into a helper function 'bool
>>> supports_type_conversion(inst, dst_type, exec_type)' that returns false
>>> for SEL and likely other things.  It might be a useful thing to have in
>>> other places, e.g. for late optimization passes like copy propagation
>>> where we need to make sure that no additional illegal conversions are
>>> introduced.  If the value returned is false you'd do what you have below
>>> for the SEL instruction, if it's true you'd do nothing unless the
>>> instruction is double-precision and the destination type is smaller than
>>> the execution type, in which case you'd do what you have below for
>>> MOV/MOV_INDIRECT.
>>>
>>
>> I have implemented the generalization of this pass following this
>> suggestion, however I needed to disallow it for some specific cases:
>>
>> * Message related opcodes (samplers, urb writes, etc): they are not
>> actually conversions between source registers and destination, so we
>> cannot legalize them.
>> * Opcodes like ASR, SHR, SHL where we reinterpret the result of the
>> opcode, so no need to legalize it either.
>> * I also restricted this legalization to opcodes where the arguments
>> have all of them the same type. In some cases, we have mixed source
>> arguments than will be either fixed in the generator (i.e. BFE/BFI2) or
>> we don't know which of them use for the destination's data type in the
>> legalization. I can add a TODO comment if you want.
>> * This pass only legalizes conversions to narrower types. I have a patch
>> to rename this pass to 'lower_narrow_conversions' (suggestions for
>> better names are welcome).
>>
> 
> This sounds kind of strange to me...  Because the type conversion
> restrictions are opcode-specific, why do you need to care about things
> you cannot even handle?
> 

I see two possibilities:

1) Legalize only the opcodes that we know are doing wrong things. This
is the approach of this v3, where in the switch we have a case for MOV*
and another for SEL opcodes. If more instructions behave wrongly, we add
a new 'case' to the switch.

2) Legalize all opcodes except the ones we know the legalization makes
no sense or it is wrong. This is the approach I mentioned in my previous
email, explicitly listing the exceptions I found to avoid GPU hangs and
regressions.

I thought you prefered 2) in a previous email, because we can legalize
undetected cases. If you prefer 1), I am OK (I have the patch too).

Once you tell me which option you prefer, I will send the v4.

Sam

>> This change is a little bit scary because if we legalize the wrong case
>> (for example a sampler message) we end up having regressions or, even
>> worse, GPU hangs. Also, depending of the generation, some tests start to
>> regress because some instructions are emitted slightly different.
>>
>> I think I fixed all the GPU hangs in all the generations I have access
>> to and I don't see any regressions on Intel CI, but I am worried if this
>> change could bite us again in the future (games, applications, new
>> tests/opcodes, etc).
>>
>> If you ask me, I prefer to keep this lowering pass as it is, i.e.
>> legalize only d2x conversions, to avoid blocking this series because of
>> this. Once this patch series lands in master, I can send an RFC patch
>> with this change and we could discuss how it can be done.
>>
>> What do you think?
>>
> 
> I don't think this patch is the only sketchy corner of the series,
> aren't you planning on sending a v4 anyway?  I suggest you just include
> what you have written.
> 
>> Sam
>>
>>>> +      case SHADER_OPCODE_MOV_INDIRECT:
>>>> +      case BRW_OPCODE_MOV: {
>>>> +         fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>>>> +         fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>>>> +         /* We clone the original instruction as we are going to modify it
>>>> +          * and emit another one after it.
>>>> +          */
>>>> +         fs_inst *strided_inst = new(ibld.shader->mem_ctx) fs_inst(*inst);
>>>
>>> Why don't you just modify the original instruction instead of cloning
>>> it, modifying the clone, and then removing the original?
>>>
>>>> +         strided_inst->dst = strided_temp;
>>>> +         /* As it is an strided destination, we write n-times more
>>>> +          * being n the size difference between source and destination types.
>>>> +          */
>>>> +         strided_inst->size_written *= (type_sz(inst->src[0].type) / type_sz(inst->dst.type));
>>>> +         ibld.emit(strided_inst);
>>>> +         ibld.MOV(dst, strided_temp);
>>>> +         /* Remove original instruction, now that is superseded. */
>>>> +         inst->remove(block);
>>>> +         break;
>>>> +      }
>>>> +      case BRW_OPCODE_SEL: {
>>>> +         fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1);
>>>> +         fs_reg strided_temp0 = subscript(temp0, inst->dst.type, 0);
>>>> +         fs_reg temp1 = ibld.vgrf(inst->src[1].type, 1);
>>>> +         fs_reg strided_temp1 = subscript(temp1, inst->dst.type, 0);
>>>>  
>>>> -      inst->remove(block);
>>>> +         /* Let's convert the operands to the destination type first */
>>>> +         ibld.MOV(strided_temp0, inst->src[0]);
>>>> +         ibld.MOV(strided_temp1, inst->src[1]);
>>>> +         inst->src[0] = strided_temp0;
>>>> +         inst->src[1] = strided_temp1;
>>>
>>> Why don't you change the destination type to match the execution type so
>>> you only need to emit one additional converting move instead of two when
>>> there is a type mismatch?
>>>
>>>> +         break;
>>>> +      }
>>>> +      default:
>>>> +         /* FIXME: If this is not a supported instruction, then we need to support it. */
>>>> +         unreachable("Not supported instruction");
>>>> +      }
>>>>        progress = true;
>>>>     }
>>>>  
>>>> -- 
>>>> 2.11.0

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 862 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170316/a643bbb4/attachment.sig>