[Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

Francisco Jerez currojerez at riseup.net
Wed Oct 14 01:15:47 PDT 2015


Alejandro Piñeiro <apinheiro at igalia.com> writes:

> On 13/10/15 23:36, Matt Turner wrote:
>> On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro <apinheiro at igalia.com> wrote:
>>> On 13/10/15 03:10, Matt Turner wrote:
>>>> Looks like this is causing an intermittent failure on HSW in our
>>>> Jenkins system (but I'm not able to reproduce locally) and a
>>>> consistent failure on G45. I'll investigate that.
>>> Ok, will hold on, and meanwhile I will try to reproduce the problem on
>>> HSW. Unfortunately I don't have any G45 available, so I will not be able
>>> to help on that. Thanks for taking a look there.
>> Okay, I think I see what's going on:
>>
>> --- good        2015-10-13 13:34:40.753357253 -0700
>> +++ bad 2015-10-13 13:36:06.114290094 -0700
>> -and(8)          g6<1>.xD        g6<4,4,1>.xD    1D              { align16 };
>> -cmp.nz.f0(8)    null            -g6<4,4,1>.xD   0D              { align16 };
>> +and.nz.f0(8)    g6<1>.xD        g6<4,4,1>.xD    1D              { align16 };
>>
>> We're propagating a .nz conditional mod from a CMP with a null dest
>> (that has a writemask of .xyzw) to an AND that has a writemask of only
>> .x, so only the .x channel of the flag is now being updated.
>
> That is really common. And that is the idea behind the first patch of
> the series that changes how if's are emitted, and allowing to optimize
> if scan_inst->writemask is WRITEMASK_X. I will use the piglit test
> glsl-1.30/execution/switch/vs-pervertex.shader_test to explain this.
>
> If we use current master, we get the following assembly:
> 1: cmp.z.f0(8)     g20<1>.xD       g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: mov.nz.f0(8)    null            g20<4,4,1>.xD                   {
> align16 1Q };
> 3: (+f0) if(8)     JIP: 6          UIP: 6                          {
> align16 1Q };
>
> If we just simplify out the mov.nz at #2, the test would fail, because
> as in the example you mentioned, null has a XYZW writemask, but g19 a
> writemask X. So this optimized version, f0 only have x updated, and #3
> is doing a normal if, that uses all the channels.
>
> But right now the if just needs to check against one component, the x.
> So that is the reason (among others) that my patch 1 changes how if's
> are emitted. So, using both my patches 1 and 2, the assembly outcome
> would be the following:
> 1: cmp.z.f0(8)     g20<1>.xD       g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: (+f0.x) if(8)   JIP: 6          UIP: 6                          {
> align16 1Q };
>
> It is true that now #1 only updates the x channel. But the if at #2 only
> use that channel now, so the test still passes.
>
>>
>> I think for now the thing to do is add
>>
>>    (inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
>
> This would bail out basically all if's.
>
I agree with Matt that this check is necessary, otherwise
cmod_propagation will replace the pair of instruction with another
instruction which doesn't have equivalent semantics.  If you still want
cases like the one you pasted below involving an if conditional that
only reads the x components to be simplified, I suggest you have a look
at dead code elimination and fix it so that writemask bits of an
instruction that writes a flag register (e.g. the "mov.nz.f0") are
turned off for components which are determined to be dead, kind of like
is done already for the vec4 components of GRFs.

With that change the assembly from your example would look like this
after DCE:

| 1: cmp.z.f0(8)     g20<1>.xD       g12<4,4,1>.xD   g19<4,4,1>.xD     { align16 1Q compacted };
| 2: mov.nz.f0(8)    null.x          g20<4,4,1>.xD                     { align16 1Q };
| 3: (+f0.x) if(8)   JIP: 6          UIP: 6                            { align16 1Q };

Which cmod propagation would be allowed to simplify further.

> So guessing about what is happening here, as I mentioned, we are
> allowing to pass that condition if scan_inst->dst.writemask equal to
> WRITEMASK_X assuming that they come from an if, so we don't need to
> update all f0 channels. But as Jason mentioned in our off-list chat,
> that would not be the case if we are dealing with a sel. So something
> like this:
> 1: cmp.z.f0(8)     g20<1>.xD       g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: mov.nz.f0(8)    null            g20<4,4,1>.xD                   {
> align16 1Q };
> 3: (+f0.x) if(8)     JIP: 6          UIP: 6                          {
> align16 1Q };
>
> #2 can be optimized out. But if we replace that if for an sel at #3, we
> can't.
>
> If that is the case, in addition to check scan_inst and inst, we would
> need to check in which kind of instruction the flag register is used later.
>
> But as I said, Im just guessing until I see the full failing test.
>
>> to the list of conditions under which we bail out. That is, if the
>> instruction we want to propagate the cmod onto writes fewer channels,
>> we can't do the optimization.
>>
>> With that, the block should look like:
>>
>>      if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) ||
>>          scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
>>          (scan_inst->dst.writemask != WRITEMASK_X &&
>>           scan_inst->dst.writemask != WRITEMASK_XYZW) ||
>>          (scan_inst->dst.writemask == WRITEMASK_XYZW &&
>>           inst->src[0].swizzle != BRW_SWIZZLE_XYZW) ||
>>          (inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
>>         break;
>>
>> The good news is that, unless I've done something wrong, this doesn't
>> affect any shaders in shader-db on ILK or HSW (I only tested those
>> two, but I expect the results are the same everywhere). Apparently
>> this is a pretty rare case.
>
> Are you sure? I have made a run adding your condition, and now comparing
> master vs having the optimization I get this:
> total instructions in shared programs: 6240631 -> 6240471 (-0.00%)
> instructions in affected programs:     18965 -> 18805 (-0.84%)
> helped:                                160
> HURT:                                  0
>
> That is a really small gain. Or put in other way, if we compare the
> conditions I have on the original patches vs adding the condition you
> are proposing, I get this:
>
> total instructions in shared programs: 6223900 -> 6240471 (0.27%)
> instructions in affected programs:     477537 -> 494108 (3.47%)
> helped:                                0
> HURT:                                  3047
>
>>
>> With that change, my R-b still stands, though we should have a unit
>> test for this case as well in 3/5.
>
> I would like to discuss this a little more before pushing.
>
>>
>> Thanks!
>> Matt
>>
>
> BR
>
> -- 
> Alejandro Piñeiro (apinheiro at igalia.com)
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151014/99d6e929/attachment.sig>


More information about the mesa-dev mailing list