[Mesa-dev] [PATCH] i965/vec4: select predicate based on writemask for sel emissions

Matt Turner mattst88 at gmail.com
Wed Nov 4 14:49:45 PST 2015


On Wed, Nov 4, 2015 at 1:01 PM, Alejandro Piñeiro <apinheiro at igalia.com> wrote:
> On 04/11/15 20:13, Matt Turner wrote:
>> On Fri, Oct 23, 2015 at 7:17 AM, Alejandro Piñeiro <apinheiro at igalia.com> wrote:
>>> Equivalent to commit 4eebeb but with sel operations. In this case
>> That commit sha doesn't exist. I suspect you meant commit 8ac3b525c.
>
> Yes, probably that sha was the one on my wip branches.
>
>>> we select the PredCtrl based on the writemask.
>>>
>>> This change allows cmod propagation to optimize out several
>>> instructions.
>> Okay, so this patch helps a case like:
>
> I don't think that the case you pointed is a good one to explain why
> this is helping. Because this seems to suggest that this case is without
> the patch but...
>
>>
>>  cmp.g.f0.0 vgrf32.0.x:F, vgrf31.xxxx:F, 0.000000F
>>  cmp.nz.f0.0 null.x:D, vgrf32.xxxx:D, 0D
> ... this patch is needed to arrive to this point (null destination
> writemask being updated).
>>  (+f0.0) sel vgrf2.0.x:F, |vgrf27.xxxx|:F, 0.000000F
>>
>> where our code thinks the SEL reads all four channels of the flag
>> register when it actually only reads .x because of the writemask.
>>
>> Adding something like this to the commit message would be good.
>
> So if we want to go into the details of why this change helps to reduce
> instructions, what about something like:
>
> "This patch helps on cases like this:
>
> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
> 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
> 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
>
> In this case, cmod propagation can't optimize instruction #2, because
> instructions #1 and #2 have different writemasks, and we can't update
> directly instruction #2 writemask because our code thinks that sel reads
> all four channels of the flag, when it actually only reads .x.
>
> So, with this patch, the previous case becames this:
>
> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
> 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
> 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
>
> Now only the x channel of the flag is used, allowing dead code eliminate
> to update the writemask at the second instruction:
> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
> 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
> 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
>
> So now cmod propagation can simplify out #2:
> 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
> 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
> "
>
> Although not sure if this explanation is too long.
>
>>
>>> Shader-db numbers:
>>> total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
>>> instructions in affected programs:     219850 -> 212023 (-3.56%)
>>> total loops in shared programs:        1979 -> 1979 (0.00%)
>>> helped:                                1192
>>> HURT:                                  0
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 +++++++++++++++++-
>>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>> index 0f04f65..bc86be6 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>> @@ -1437,8 +1437,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>>>     case nir_op_bcsel:
>>>        emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ));
>>>        inst = emit(BRW_OPCODE_SEL, dst, op[1], op[2]);
>>> -      inst->predicate = BRW_PREDICATE_NORMAL;
>>> +      switch (dst.writemask) {
>>> +      case WRITEMASK_X:
>>> +         inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_X;
>>> +         break;
>>> +      case WRITEMASK_Y:
>>> +         inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Y;
>>> +         break;
>>> +      case WRITEMASK_Z:
>>> +         inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Z;
>>> +         break;
>>> +      case WRITEMASK_W:
>>> +         inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_W;
>>> +         break;
>>> +      default:
>>> +         inst->predicate = BRW_PREDICATE_NORMAL;
>>>        break;
>> This break, and the one two lines below it are indented incorrectly.
>
> Ok.
>
>>> +      }
>>> +   break;
>>>
>>>     case nir_op_fdot_replicated2:
>>>        inst = emit(BRW_OPCODE_DP2, dst, op[0], op[1]);
>>> --
>>> 2.1.4
>>>
>> With the commit message fixed/updated and the breaks properly indented,
>
> As I changed totally the use case and description, I would prefer to
> confirm that I explained myself properly before pushing.

Thanks. Your description is great -- it'll definitely help future
readers to understand exactly why this is needed.


More information about the mesa-dev mailing list