[Mesa-dev] [PATCH] i965 : Performance Improvement
Ian Romanick
idr at freedesktop.org
Fri Jul 14 05:20:55 UTC 2017
On 07/13/2017 09:35 PM, Kenneth Graunke wrote:
> On Thursday, July 13, 2017 9:09:09 PM PDT aravindan.muthukumar at intel.com wrote:
>> From: Aravindan M <aravindan.muthukumar at intel.com>
>
> The commit title should be something like, "i965: Optimize atom state
> flag checks" rather than a generic "Performance Improvement"
>
>> This patch improves CPI Rate(Cycles per Instruction)
>> and CPU time utilization for i965. The functions
>> check_state and brw_pipeline_state_finished was found
>> poor CPU utilization from performance analysis.
>
> Need actual data here, or show assembly quality improvements.
I'll second this. There need to be some actual measurements and data.
These seem like the most micro of micro optimizations, so some
supporting data is needed. Just saying "improves CPI rate" is not
sufficient. That should include the platform on which it was measured.
>> Change-Id: I17c7e719a16e222764217a0e67b4482748537b67
>> Signed-off-by: Aravindan M <aravindan.muthukumar at intel.com>
>> Reviewed-by: Yogesh M <yogesh.marathe at intel.com>
>> Tested-by: Asish <asish at intel.com>
>> ---
>> src/mesa/drivers/dri/i965/brw_defines.h | 3 +++
>> src/mesa/drivers/dri/i965/brw_state_upload.c | 14 +++++++++++---
>> 2 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h
>> index a4794c6..60f88ca 100644
>> --- a/src/mesa/drivers/dri/i965/brw_defines.h
>> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
>> @@ -1681,3 +1681,6 @@ enum brw_pixel_shader_coverage_mask_mode {
>> # define GEN8_L3CNTLREG_ALL_ALLOC_MASK INTEL_MASK(31, 25)
>>
>> #endif
>> +
>> +/* Checking the state of mesa and brw before emitting atoms */
>> +#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw))
>> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> index 5e82c1b..434decf 100644
>> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
>> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> @@ -515,7 +515,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
>> const struct brw_tracked_state *atom = &atoms[i];
>> struct brw_state_flags generated;
>>
>> - check_and_emit_atom(brw, &state, atom);
>> + /* Checking the state and emitting the atoms */
>> + if (CHECK_BRW_STATE(state, atom->dirty)) {
>> + check_and_emit_atom(brw, &state, atom);
>> + }
>>
>> accumulate_state(&examined, &atom->dirty);
>>
>> @@ -532,7 +535,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
>> for (i = 0; i < num_atoms; i++) {
>> const struct brw_tracked_state *atom = &atoms[i];
>>
>> - check_and_emit_atom(brw, &state, atom);
>> + /* Checking the state and emitting the atoms */
>> + if (CHECK_BRW_STATE(state, atom->dirty)) {
>> + check_and_emit_atom(brw, &state, atom);
>> + }
>
> This doesn't make any sense...the very first thing check_and_emit_atom()
> does is call check_state(), which does the exact thing your CHECK_BRW_STATE
> macro does. So you're just needlessly checking the same thing twice.
>
> The only reason I could see this helping is if check_state() wasn't inlined,
> but a release build with -O2 definitely inlines both check_and_emit_atom()
> and check_state().
>
> Are you using GCC? What are your CFLAGS? -O2? I hope you're not trying
> to optimize a debug build...
>
>> }
>> }
>>
>> @@ -567,7 +573,9 @@ brw_pipeline_state_finished(struct brw_context *brw,
>> brw->state.pipelines[i].mesa |= brw->NewGLState;
>> brw->state.pipelines[i].brw |= brw->ctx.NewDriverState;
>> } else {
>> - memset(&brw->state.pipelines[i], 0, sizeof(struct brw_state_flags));
>> + /* Avoiding the memset with initialization */
>> + brw->state.pipelines[i].mesa = 0;
>> + brw->state.pipelines[i].brw = 0ull;
>> }
>> }
>
> This is a separate change.
>
> I'm also not seeing why it's useful:
>
> Assembly before (GCC 7.1.1, x86_64, -O2 -fno-omit-frame-pointer):
>
> 00000000003e0380 <brw_render_state_finished>:
> 3e0380: 66 0f ef c0 pxor %xmm0,%xmm0
> 3e0384: 55 push %rbp
> 3e0385: 8b 87 20 52 02 00 mov 0x25220(%rdi),%eax
> 3e038b: c7 87 20 52 02 00 00 movl $0x0,0x25220(%rdi)
> 3e0392: 00 00 00
> 3e0395: 48 89 e5 mov %rsp,%rbp
> 3e0398: 09 87 38 52 02 00 or %eax,0x25238(%rdi)
> 3e039e: 48 8b 87 38 4d 02 00 mov 0x24d38(%rdi),%rax
> 3e03a5: 5d pop %rbp
> 3e03a6: 48 09 87 40 52 02 00 or %rax,0x25240(%rdi)
> 3e03ad: 48 c7 87 38 4d 02 00 movq $0x0,0x24d38(%rdi)
> 3e03b4: 00 00 00 00
> 3e03b8: 0f 11 87 28 52 02 00 movups %xmm0,0x25228(%rdi)
> 3e03bf: c3 retq
>
> Assembly after:
>
> 00000000003e0380 <brw_render_state_finished>:
> 3e0380: 55 push %rbp
> 3e0381: 8b 87 20 52 02 00 mov 0x25220(%rdi),%eax
> 3e0387: c7 87 28 52 02 00 00 movl $0x0,0x25228(%rdi)
> 3e038e: 00 00 00
> 3e0391: 09 87 38 52 02 00 or %eax,0x25238(%rdi)
> 3e0397: 48 89 e5 mov %rsp,%rbp
> 3e039a: 48 8b 87 38 4d 02 00 mov 0x24d38(%rdi),%rax
> 3e03a1: 48 c7 87 30 52 02 00 movq $0x0,0x25230(%rdi)
> 3e03a8: 00 00 00 00
> 3e03ac: 48 09 87 40 52 02 00 or %rax,0x25240(%rdi)
> 3e03b3: c7 87 20 52 02 00 00 movl $0x0,0x25220(%rdi)
> 3e03ba: 00 00 00
> 3e03bd: 48 c7 87 38 4d 02 00 movq $0x0,0x24d38(%rdi)
> 3e03c4: 00 00 00 00
> 3e03c8: 5d pop %rbp
> 3e03c9: c3 retq
> 3e03ca: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
>
> That looks like more instructions to me...
>
> Thanks for looking at ways to optimize our CPU overhead. I'm sure there
> are some improvements we can make...
>
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170713/5c9b746d/attachment.sig>
More information about the mesa-dev
mailing list