[Mesa-dev] [PATCH] i965 : Performance Improvement
Eero Tamminen
eero.t.tamminen at intel.com
Fri Jul 14 08:49:47 UTC 2017
Hi,
On 14.07.2017 09:38, Marathe, Yogesh wrote:
[...]
>>>> The only reason I could see this helping is if check_state() wasn't
>>>> inlined, but a release build with -O2 definitely inlines both
>>>> check_and_emit_atom() and check_state().
>>>>
>>>> Are you using GCC? What are your CFLAGS? -O2? I hope you're not
>>>> trying to optimize a debug build...
>>>
>>> Yes we are using O2 and its clang on android and it's not debug.
>>
>> Okay. I just built with Clang 4.0.1 and -O2 and both check_state and
>> check_and_emit_atom() are inlined into the atom loop in
>> brw_upload_pipeline_state().
>>
>> So I'm still not sure how this would improve anything.
>
> Yes, the improvement is not huge per say but we essentially see CPI and
> cpu utilization is coming down with this. We also see slightly improved scores
> on graphics benchmarks, particularly 3dmark with the patch. If this was
> optimized out by compiler we shouldn't have seen the difference on same
> build with and without patch. We'll confirm the clang version.
>
> I think this removes branch instructions least and being in busy path this will
> have an impact, provided compiler doesn't do it, as you rightly mentioned.
Did you disassemble the produced code to verify that it improved things
like you thought it to improve?
The reason why ask, is that just doing changes to unrelated parts of
code can sometimes improve performance because it changes code size and
therefore impacts how things end up being mapped to memory and cached.
(In some cases I've see several percent performance increases and drops
even from LD_PRELOADing a random, unused library to a process.)
- Eero
More information about the mesa-dev
mailing list