[Mesa-dev] [PATCH] i965 : Performance Improvement

Kenneth Graunke kenneth at whitecape.org
Fri Jul 14 05:49:52 UTC 2017


On Thursday, July 13, 2017 9:49:40 PM PDT Marathe, Yogesh wrote:
> Kenneth,
> 
> > -----Original Message-----
> > From: mesa-dev [mailto:mesa-dev-bounces at lists.freedesktop.org] On Behalf
> > Of Kenneth Graunke
> > Sent: Friday, July 14, 2017 10:05 AM
> > To: mesa-dev at lists.freedesktop.org
> > Cc: Muthukumar, Aravindan <aravindan.muthukumar at intel.com>
> > Subject: Re: [Mesa-dev] [PATCH] i965 : Performance Improvement
> > 
> > On Thursday, July 13, 2017 9:09:09 PM PDT aravindan.muthukumar at intel.com
> > wrote:
> > > From: Aravindan M <aravindan.muthukumar at intel.com>
> > 
> > The commit title should be something like, "i965: Optimize atom state flag
> > checks" rather than a generic "Performance Improvement"
> > 
> > > This patch improves CPI Rate(Cycles per Instruction) and CPU time
> > > utilization for i965. The functions check_state and
> > > brw_pipeline_state_finished was found poor CPU utilization from
> > > performance analysis.
> > 
> > Need actual data here, or show assembly quality improvements.
> > 
> > > Change-Id: I17c7e719a16e222764217a0e67b4482748537b67
> > > Signed-off-by: Aravindan M <aravindan.muthukumar at intel.com>
> > > Reviewed-by: Yogesh M <yogesh.marathe at intel.com>
> > > Tested-by: Asish <asish at intel.com>
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_defines.h      |  3 +++
> > >  src/mesa/drivers/dri/i965/brw_state_upload.c | 14 +++++++++++---
> > >  2 files changed, 14 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> > > b/src/mesa/drivers/dri/i965/brw_defines.h
> > > index a4794c6..60f88ca 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > > @@ -1681,3 +1681,6 @@ enum brw_pixel_shader_coverage_mask_mode {
> > >  # define GEN8_L3CNTLREG_ALL_ALLOC_MASK     INTEL_MASK(31, 25)
> > >
> > >  #endif
> > > +
> > > +/* Checking the state of mesa and brw before emitting atoms */
> > > +#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw))
> > > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
> > > b/src/mesa/drivers/dri/i965/brw_state_upload.c
> > > index 5e82c1b..434decf 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> > > @@ -515,7 +515,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
> > >  	 const struct brw_tracked_state *atom = &atoms[i];
> > >  	 struct brw_state_flags generated;
> > >
> > > -         check_and_emit_atom(brw, &state, atom);
> > > +         /* Checking the state and emitting the atoms */
> > > +         if (CHECK_BRW_STATE(state, atom->dirty)) {
> > > +            check_and_emit_atom(brw, &state, atom);
> > > +         }
> > >
> > >  	 accumulate_state(&examined, &atom->dirty);
> > >
> > > @@ -532,7 +535,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
> > >        for (i = 0; i < num_atoms; i++) {
> > >  	 const struct brw_tracked_state *atom = &atoms[i];
> > >
> > > -         check_and_emit_atom(brw, &state, atom);
> > > +         /* Checking the state and emitting the atoms */
> > > +         if (CHECK_BRW_STATE(state, atom->dirty)) {
> > > +            check_and_emit_atom(brw, &state, atom);
> > > +         }
> > 
> > This doesn't make any sense...the very first thing check_and_emit_atom() does
> > is call check_state(), which does the exact thing your CHECK_BRW_STATE macro
> > does.  So you're just needlessly checking the same thing twice.
> > 
> 
> Sorry Kenneth, This is incomplete patch. The original patch that I reviewed also had 
> if check removed as below
> 
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -417,10 +417,8 @@ check_and_emit_atom(struct brw_context *brw,
>                      struct brw_state_flags *state,
>                      const struct brw_tracked_state *atom)
>  {
> -   if (check_state(state, &atom->dirty)) {
>     atom->emit(brw);
>     merge_ctx_state(brw, state);
> -   }
>  }
> 
> Aravindan will push another set. 
> 
> > The only reason I could see this helping is if check_state() wasn't inlined, but a
> > release build with -O2 definitely inlines both check_and_emit_atom() and
> > check_state().
> > 
> > Are you using GCC?  What are your CFLAGS?  -O2?  I hope you're not trying to
> > optimize a debug build...
> > 
> 
> Yes we are using O2 and its clang on android and it's not debug.

Okay.  I just built with Clang 4.0.1 and -O2 and both check_state and
check_and_emit_atom() are inlined into the atom loop in
brw_upload_pipeline_state().

So I'm still not sure how this would improve anything.

> > >        }
> > >     }
> > >
> > > @@ -567,7 +573,9 @@ brw_pipeline_state_finished(struct brw_context *brw,
> > >           brw->state.pipelines[i].mesa |= brw->NewGLState;
> > >           brw->state.pipelines[i].brw |= brw->ctx.NewDriverState;
> > >        } else {
> > > -         memset(&brw->state.pipelines[i], 0, sizeof(struct brw_state_flags));
> > > +         /* Avoiding the memset with initialization */
> > > +         brw->state.pipelines[i].mesa = 0;
> > > +         brw->state.pipelines[i].brw = 0ull;
> > >        }
> > >     }
> > 

Clang 4.0.1 optimizes away the memset as well.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170713/e93738a3/attachment-0001.sig>


More information about the mesa-dev mailing list