[Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

Tapani Pälli tapani.palli at intel.com
Mon Aug 14 08:57:28 UTC 2017


Hello;

On 07/25/2017 05:17 PM, Marathe, Yogesh wrote:
> Hi Matt, Sorry for late reply, please see below.
> 
>> -----Original Message-----
>> From: mesa-dev [mailto:mesa-dev-bounces at lists.freedesktop.org] On Behalf
>> Of Matt Turner
>> Sent: Saturday, July 22, 2017 12:12 AM
>> To: Muthukumar, Aravindan <aravindan.muthukumar at intel.com>
>> Cc: mesa-dev at lists.freedesktop.org; Marathe, Yogesh
>> <yogesh.marathe at intel.com>
>> Subject: Re: [Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks
>>
>> On 07/21, aravindan.muthukumar at intel.com wrote:
>>> From: Aravindan Muthukumar <aravindan.muthukumar at intel.com>
>>>
>>> This patch improves CPI Rate(Cycles per Instruction) and branch miss
>>> predict for i965. The function check_state() was showing CPI retired rate.
>>>
>>> Performance stats with android:
>>> - CPI retired lowered by 28% (lower is better)
>>> - Branch missprediction lowered by 13% (lower is better)
>>> - 3DMark improved by 2%
>>>
>>> The dissassembly doesn't show difference, although above results were
>>> observed with patch.
>>
> 
> Yes this is true for V3 where we removed the function based on a review comment.
> 
>> If there's no difference in the assembly then whatever measurements you have
>> taken must be noise.
>>
> 
> No that's not guaranteed either. Lot of things still depend on how instructions are
> aligned, sometimes even changing linking order gives different results where
> disassemblies of individual functions remain same.
>   
>> I applied the patch and inspected brw_state_upload.o. There are assembly
>> differences. I can produce the same assembly as this patch by just pulling the if
>> statement out of check_and_emit_atom() and into the caller. The replacement
>> of check_state() with a macro is completely unnecessary.
>>
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
>> b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> index acaa97ee7d..b163e1490d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
>> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> @@ -439,14 +439,12 @@ merge_ctx_state(struct brw_context *brw,  }
>>
>>   static inline void
>> -check_and_emit_atom(struct brw_context *brw,
>> -                    struct brw_state_flags *state,
>> -                    const struct brw_tracked_state *atom)
>> +emit_atom(struct brw_context *brw,
>> +          struct brw_state_flags *state,
>> +          const struct brw_tracked_state *atom)
>>   {
>> -   if (check_state(state, &atom->dirty)) {
>> -      atom->emit(brw);
>> -      merge_ctx_state(brw, state);
>> -   }
>> +   atom->emit(brw);
>> +   merge_ctx_state(brw, state);
>>   }
>>
>>   static inline void
>> @@ -541,7 +539,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
>>   	 const struct brw_tracked_state *atom = &atoms[i];
>>   	 struct brw_state_flags generated;
>>
>> -         check_and_emit_atom(brw, &state, atom);
>> +         if (check_state(&state, &atom->dirty)) {
>> +            emit_atom(brw, &state, atom);
>> +         }
>>
>>   	 accumulate_state(&examined, &atom->dirty);
>>
>> @@ -558,7 +558,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
>>         for (i = 0; i < num_atoms; i++) {
>>   	 const struct brw_tracked_state *atom = &atoms[i];
>>
>> -         check_and_emit_atom(brw, &state, atom);
>> +         if (check_state(&state, &atom->dirty)) {
>> +            emit_atom(brw, &state, atom);
>> +         }
>>         }
>>      }
>>
>>
>> With that said, the assembly differences are not understandable to me.
>> Why should extracting an if statement from a static inline function into the caller
>> of that function cause any difference whatsoever?
> 
> Agreed, it shouldn't in case of static inline.
> 
>>
>> I'm viewing the assembly differences with:
>>
>> wdiff -n \
>>      -w $'\033[30;41m' -x $'\033[0m' \
>>      -y $'\033[30;42m' -z $'\033[0m' \
>>      <(objdump -d brw_state_upload.o.before | sed -e 's/^.*\t//') \
>>      <(objdump -d brw_state_upload.o.wtf    | sed -e 's/^.*\t//') | less -R
>>
>> and the only real difference is the movement of some call and jmp instructions.
>>
>> I cannot take the patch without some reasonable explanation for the change.
> 
> Ok I think this has been discussed already and we agree that there is no big visible difference
> in disassembly which can be pointed out for improvement. Although, as you said this movement
> of instructions can cause this. If with this patch instructions get cache aligned that too can show
> improvement.  This is a busy function with bad CPI. Hence chosen for optimization. Branch miss
> predict is another metric.  Do we want to consider all these or just disassembly?

I don't have a reasonable explanation to give but I made some benchmark 
runs today (3DMark "Ice Storm Unlimited" test ) and I'm getting 
consistently 1-2% better results with the patch. I also tried the 
mentioned modification "pulling the if statement out of 
check_and_emit_atom() and into the caller" and it has same performance 
benefits. What I can tell from assembly dump is that 
brw_upload_render_state function becomes slightly shorter, there's 
movement with call and jmp and dump with the patch has less mov and nopl 
calls in total.


> Let me make one more attempt, we clearly see icache misses for brw_upload_pipeline_state also
> reduced with patch against without patch. Do you think that too is noise with *.o.wtf?
> I would be happy to learn if that's the case to understand all this better.
> 
> I believe there are two reasons why this should go in
> 1. It consistently benefits android (we are ok to rename it), helps reduce overhead
> 2. It doesn't harm other platforms
> 
> We are ok to drop it if above two claims can be negated with someone else other than us testing
> this. Otherwise can you please reconsider?
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 


More information about the mesa-dev mailing list