[Mesa-dev] [PATCH] i965/vec4: select predicate based on writemask for sel emissions
Alejandro Piñeiro
apinheiro at igalia.com
Thu Nov 5 00:02:48 PST 2015
On 04/11/15 23:49, Matt Turner wrote:
> On Wed, Nov 4, 2015 at 1:01 PM, Alejandro Piñeiro <apinheiro at igalia.com> wrote:
>> On 04/11/15 20:13, Matt Turner wrote:
>>> On Fri, Oct 23, 2015 at 7:17 AM, Alejandro Piñeiro <apinheiro at igalia.com> wrote:
>>>> Equivalent to commit 4eebeb but with sel operations. In this case
>>> That commit sha doesn't exist. I suspect you meant commit 8ac3b525c.
>> Yes, probably that sha was the one on my wip branches.
>>
>>>> we select the PredCtrl based on the writemask.
>>>>
>>>> This change allows cmod propagation to optimize out several
>>>> instructions.
>>> Okay, so this patch helps a case like:
>> I don't think that the case you pointed is a good one to explain why
>> this is helping. Because this seems to suggest that this case is without
>> the patch but...
>>
>>> cmp.g.f0.0 vgrf32.0.x:F, vgrf31.xxxx:F, 0.000000F
>>> cmp.nz.f0.0 null.x:D, vgrf32.xxxx:D, 0D
>> ... this patch is needed to arrive to this point (null destination
>> writemask being updated).
>>> (+f0.0) sel vgrf2.0.x:F, |vgrf27.xxxx|:F, 0.000000F
>>>
>>> where our code thinks the SEL reads all four channels of the flag
>>> register when it actually only reads .x because of the writemask.
>>>
>>> Adding something like this to the commit message would be good.
>> So if we want to go into the details of why this change helps to reduce
>> instructions, what about something like:
>>
>> "This patch helps on cases like this:
>>
>> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
>> 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
>> 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
>>
>> In this case, cmod propagation can't optimize instruction #2, because
>> instructions #1 and #2 have different writemasks, and we can't update
>> directly instruction #2 writemask because our code thinks that sel reads
>> all four channels of the flag, when it actually only reads .x.
>>
>> So, with this patch, the previous case becames this:
>>
>> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
>> 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
>> 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
>>
>> Now only the x channel of the flag is used, allowing dead code eliminate
>> to update the writemask at the second instruction:
>> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
>> 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
>> 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
>>
>> So now cmod propagation can simplify out #2:
>> 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
>> 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
>> "
>>
>> Although not sure if this explanation is too long.
>>
>>>> Shader-db numbers:
>>>> total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
>>>> instructions in affected programs: 219850 -> 212023 (-3.56%)
>>>> total loops in shared programs: 1979 -> 1979 (0.00%)
>>>> helped: 1192
>>>> HURT: 0
>>>> ---
>>>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 +++++++++++++++++-
>>>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>>> index 0f04f65..bc86be6 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>>> @@ -1437,8 +1437,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>>>> case nir_op_bcsel:
>>>> emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ));
>>>> inst = emit(BRW_OPCODE_SEL, dst, op[1], op[2]);
>>>> - inst->predicate = BRW_PREDICATE_NORMAL;
>>>> + switch (dst.writemask) {
>>>> + case WRITEMASK_X:
>>>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_X;
>>>> + break;
>>>> + case WRITEMASK_Y:
>>>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Y;
>>>> + break;
>>>> + case WRITEMASK_Z:
>>>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Z;
>>>> + break;
>>>> + case WRITEMASK_W:
>>>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_W;
>>>> + break;
>>>> + default:
>>>> + inst->predicate = BRW_PREDICATE_NORMAL;
>>>> break;
>>> This break, and the one two lines below it are indented incorrectly.
>> Ok.
>>
>>>> + }
>>>> + break;
>>>>
>>>> case nir_op_fdot_replicated2:
>>>> inst = emit(BRW_OPCODE_DP2, dst, op[0], op[1]);
>>>> --
>>>> 2.1.4
>>>>
>>> With the commit message fixed/updated and the breaks properly indented,
>> As I changed totally the use case and description, I would prefer to
>> confirm that I explained myself properly before pushing.
> Thanks. Your description is great -- it'll definitely help future
> readers to understand exactly why this is needed.
Patch pushed with that description (with minor tweaks). Thanks for the
patience.
BR
--
Alejandro Piñeiro (apinheiro at igalia.com)
More information about the mesa-dev
mailing list