[Mesa-dev] [PATCH v3 07/24] i965/fs: generalize the legalization d2x pass

Tue Mar 7 00:32:13 UTC 2017

Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> On 04/03/17 00:19, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> 
>>> Add support to SEL instruction and add an assert to detect unsupported
>>> instructions than do d2x conversions.
>>>
>>> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>>> ---
>>>
>>> Curro, this patch legalizes SEL instruction too. If other optimizations
>>> modify later any SEL's (or any other instruction's) destination type
>>> (hence, producing a non-lowered d2x conversion), we can call it again
>>> around the end of fs_visitor::optimize(). Possibly together with
>>> lower_simd_width() just in case it was added later.
>>>
>> 
>> This sounds rather scary...  How do you make sure that this doesn't lead
>> to an infinite legalization-optimization loop in which copy propagation
>> reverses the effect of lower_d2x making double conversions illegal
>> again?  If you do already, why do you need to run lower_d2x multiple
>> times?  Wouldn't it be sufficient to run it once near the end of
>> optimize(), and then re-run copy propagation and possibly DCE?
>> 
>
> Right. I am going to change where we call it.
>
>>> For that reason there is the inst->dst.stride > 1 condition in the
>>> test. This detects if either we emitted a strided destination in
>>> purpose or it was as a result of a previous lower_d2x run, we don't
>>> want to lowered it.
>>>
>> The problem with this is that if you ended up with dst.stride > 1 due to
>> different fields of the same scalar quantity being defined by two
>> separate instructions (e.g. by using subscript(dst, ..., i)), you *need*
>> to apply the lowering pass regardless, because otherwise the second
>> instruction will corrupt the data written by the first instruction.
>> 
>>> However, as I have not hit that case yet, I prefer to wait for your
>>> opinion. What do you think?
>>>
>>>
>>>  src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp | 57 ++++++++++++++++++--------
>>>  1 file changed, 41 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>>> index a2db1154615..330f2552929 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
>>> @@ -33,17 +33,9 @@ fs_visitor::lower_d2x()
>>>     bool progress = false;
>>>  
>>>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>>> -      if (inst->opcode != BRW_OPCODE_MOV)
>>> -         continue;
>>> -
>>> -      if (inst->dst.type != BRW_REGISTER_TYPE_F &&
>>> -          inst->dst.type != BRW_REGISTER_TYPE_D &&
>>> -          inst->dst.type != BRW_REGISTER_TYPE_UD)
>>> -         continue;
>>> -
>>> -      if (inst->src[0].type != BRW_REGISTER_TYPE_DF &&
>>> -          inst->src[0].type != BRW_REGISTER_TYPE_UQ &&
>>> -          inst->src[0].type != BRW_REGISTER_TYPE_Q)
>>> +      if (get_exec_type_size(inst) != 8 ||
>>> +          type_sz(inst->dst.type) >= get_exec_type_size(inst) ||
>> 
>> Note that some type conversion restrictions apply even if the execution
>> type is single-precision, and even if the destination type size is not
>> less than the execution type, e.g. according to the hardware docs SEL
>> doesn't support F->UD or F->DF conversions which the condition above
>> would consider okay.
>> 
>
> Right.
>
>>> +          inst->dst.stride > 1)
>>>           continue;
>>>  
>>>        assert(inst->dst.file == VGRF);
>>> @@ -61,13 +53,46 @@ fs_visitor::lower_d2x()
>>>         * So we need to allocate a temporary that's two registers, and then do
>>>         * a strided MOV to get the lower DWord of every Qword that has the
>>>         * result.
>>> +       *
>>> +       * This pass legalizes all the DF conversions to narrower types.
>>>         */
>>> -      fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>>> -      fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>>> -      ibld.MOV(strided_temp, inst->src[0]);
>>> -      ibld.MOV(dst, strided_temp);
>>> +      switch (inst->opcode) {
>> 
>> I suggest you refactor this into a helper function 'bool
>> supports_type_conversion(inst, dst_type, exec_type)' that returns false
>> for SEL and likely other things.  It might be a useful thing to have in
>> other places, e.g. for late optimization passes like copy propagation
>> where we need to make sure that no additional illegal conversions are
>> introduced.  If the value returned is false you'd do what you have below
>> for the SEL instruction, if it's true you'd do nothing unless the
>> instruction is double-precision and the destination type is smaller than
>> the execution type, in which case you'd do what you have below for
>> MOV/MOV_INDIRECT.
>> 
>
> OK. I'm going to add it as static in the file, so we can promote it to a
> header file when it is used within other optimizations/lowerings.
>
>>> +      case SHADER_OPCODE_MOV_INDIRECT:
>>> +      case BRW_OPCODE_MOV: {
>>> +         fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
>>> +         fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
>>> +         /* We clone the original instruction as we are going to modify it
>>> +          * and emit another one after it.
>>> +          */
>>> +         fs_inst *strided_inst = new(ibld.shader->mem_ctx) fs_inst(*inst);
>> 
>> Why don't you just modify the original instruction instead of cloning
>> it, modifying the clone, and then removing the original?
>> 
>
> Right but I did it because I need to emit the strided MOV after the
> instructions... which means adding support for this in fs_builder.
>
> As you prefer to reuse the original instruction, then I will send a
> patch implementing emit_after() functions in fs_builder and use them here.
>
>>> +         strided_inst->dst = strided_temp;
>>> +         /* As it is an strided destination, we write n-times more
>>> +          * being n the size difference between source and destination types.
>>> +          */
>>> +         strided_inst->size_written *= (type_sz(inst->src[0].type) / type_sz(inst->dst.type));
>>> +         ibld.emit(strided_inst);
>>> +         ibld.MOV(dst, strided_temp);
>>> +         /* Remove original instruction, now that is superseded. */
>>> +         inst->remove(block);
>>> +         break;
>>> +      }
>>> +      case BRW_OPCODE_SEL: {
>>> +         fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1);
>>> +         fs_reg strided_temp0 = subscript(temp0, inst->dst.type, 0);
>>> +         fs_reg temp1 = ibld.vgrf(inst->src[1].type, 1);
>>> +         fs_reg strided_temp1 = subscript(temp1, inst->dst.type, 0);
>>>  
>>> -      inst->remove(block);
>>> +         /* Let's convert the operands to the destination type first */
>>> +         ibld.MOV(strided_temp0, inst->src[0]);
>>> +         ibld.MOV(strided_temp1, inst->src[1]);
>>> +         inst->src[0] = strided_temp0;
>>> +         inst->src[1] = strided_temp1;
>> 
>> Why don't you change the destination type to match the execution type so
>> you only need to emit one additional converting move instead of two when
>> there is a type mismatch?
>> 
>
> This was to avoid defining emit_after() in fs_builder. But as you prefer
> that, then I will do it.
>

No need to add any new emit functions, there is a pile of overloads
already and I wouldn't like to end up with a combinatorial explosion of
_before and _after methods when you could just compose the current
emit() functions on the result of bld.at(inst->next, block) to get the
same semantics.

> Thanks for the review! :)
>
Thanks!

> Sam
>
>>> +         break;
>>> +      }
>>> +      default:
>>> +         /* FIXME: If this is not a supported instruction, then we need to support it. */
>>> +         unreachable("Not supported instruction");
>>> +      }
>>>        progress = true;
>>>     }
>>>  
>>> -- 
>>> 2.11.0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170306/5333e616/attachment-0001.sig>