[Mesa-dev] [PATCH] i965/vec4: select predicate based on writemask for sel emissions

Alejandro Piñeiro apinheiro at igalia.com
Wed Nov 4 13:01:52 PST 2015


On 04/11/15 20:13, Matt Turner wrote:
> On Fri, Oct 23, 2015 at 7:17 AM, Alejandro Piñeiro <apinheiro at igalia.com> wrote:
>> Equivalent to commit 4eebeb but with sel operations. In this case
> That commit sha doesn't exist. I suspect you meant commit 8ac3b525c.

Yes, probably that sha was the one on my wip branches.

>> we select the PredCtrl based on the writemask.
>>
>> This change allows cmod propagation to optimize out several
>> instructions.
> Okay, so this patch helps a case like:

I don't think that the case you pointed is a good one to explain why
this is helping. Because this seems to suggest that this case is without
the patch but...

>
>  cmp.g.f0.0 vgrf32.0.x:F, vgrf31.xxxx:F, 0.000000F
>  cmp.nz.f0.0 null.x:D, vgrf32.xxxx:D, 0D
... this patch is needed to arrive to this point (null destination
writemask being updated).
>  (+f0.0) sel vgrf2.0.x:F, |vgrf27.xxxx|:F, 0.000000F
>
> where our code thinks the SEL reads all four channels of the flag
> register when it actually only reads .x because of the writemask.
>
> Adding something like this to the commit message would be good.

So if we want to go into the details of why this change helps to reduce
instructions, what about something like:

"This patch helps on cases like this:

1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel reads
all four channels of the flag, when it actually only reads .x.

So, with this patch, the previous case becames this:

1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Now only the x channel of the flag is used, allowing dead code eliminate
to update the writemask at the second instruction:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

So now cmod propagation can simplify out #2:
1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
"

Although not sure if this explanation is too long.

>
>> Shader-db numbers:
>> total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
>> instructions in affected programs:     219850 -> 212023 (-3.56%)
>> total loops in shared programs:        1979 -> 1979 (0.00%)
>> helped:                                1192
>> HURT:                                  0
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 +++++++++++++++++-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> index 0f04f65..bc86be6 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> @@ -1437,8 +1437,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>>     case nir_op_bcsel:
>>        emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ));
>>        inst = emit(BRW_OPCODE_SEL, dst, op[1], op[2]);
>> -      inst->predicate = BRW_PREDICATE_NORMAL;
>> +      switch (dst.writemask) {
>> +      case WRITEMASK_X:
>> +         inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_X;
>> +         break;
>> +      case WRITEMASK_Y:
>> +         inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Y;
>> +         break;
>> +      case WRITEMASK_Z:
>> +         inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Z;
>> +         break;
>> +      case WRITEMASK_W:
>> +         inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_W;
>> +         break;
>> +      default:
>> +         inst->predicate = BRW_PREDICATE_NORMAL;
>>        break;
> This break, and the one two lines below it are indented incorrectly.

Ok.

>> +      }
>> +   break;
>>
>>     case nir_op_fdot_replicated2:
>>        inst = emit(BRW_OPCODE_DP2, dst, op[0], op[1]);
>> --
>> 2.1.4
>>
> With the commit message fixed/updated and the breaks properly indented,

As I changed totally the use case and description, I would prefer to
confirm that I explained myself properly before pushing.

>
> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>
> Thanks!
>

Thanks for reviewing.

BR

-- 
Alejandro Piñeiro (apinheiro at igalia.com)



More information about the mesa-dev mailing list