[Mesa-dev] [PATCH V2] i965 : Optimize atom state flag checks
Marathe, Yogesh
yogesh.marathe at intel.com
Thu Jul 20 19:13:14 UTC 2017
Francisco,
> -----Original Message-----
> From: Francisco Jerez [mailto:currojerez at riseup.net]
> Sent: Friday, July 21, 2017 12:21 AM
> To: Marathe, Yogesh <yogesh.marathe at intel.com>; Muthukumar, Aravindan
> <aravindan.muthukumar at intel.com>; mesa-dev at lists.freedesktop.org
> Cc: Muthukumar, Aravindan <aravindan.muthukumar at intel.com>
> Subject: RE: [Mesa-dev] [PATCH V2] i965 : Optimize atom state flag checks
>
> "Marathe, Yogesh" <yogesh.marathe at intel.com> writes:
>
> > Francisco,
> >
> >> -----Original Message-----
> >> From: mesa-dev [mailto:mesa-dev-bounces at lists.freedesktop.org] On
> >> Behalf Of Francisco Jerez
> >> Sent: Thursday, July 20, 2017 10:51 PM
> >> To: Muthukumar, Aravindan <aravindan.muthukumar at intel.com>; mesa-
> >> dev at lists.freedesktop.org
> >> Cc: Muthukumar, Aravindan <aravindan.muthukumar at intel.com>
> >> Subject: Re: [Mesa-dev] [PATCH V2] i965 : Optimize atom state flag
> >> checks
> >>
> >> aravindan.muthukumar at intel.com writes:
> >>
> >> > From: Aravindan Muthukumar <aravindan.muthukumar at intel.com>
> >> >
> >> > This patch improves CPI Rate(Cycles per Instruction) and branch
> >> > mispredict for i965. The function check_state() was showing CPI
> >> > retired rate.
> >> >
> >> > Performance stats with android:
> >> > CPI retired lowered by 28% (lower is better) Branch missprediction
> >> > lowered by 13% (lower is better) 3DMark improved by 2%
> >> >
> >> > The dissassembly doesn't show difference, although above results
> >> > were observed with patch.
> >> >
> >>
> >> How did you determine that your results are not just statistical noise?
> >
> > No its not statistical noise. As commit msg mentions, we used metrics
> > CPI retired rate, utilization, branch miss predict as metrics, these can be
> measured using SEP on IA.
> > It essentially enables event based sampling and we can measure these through
> counters.
> >
>
> How much variance do these metrics have? (particularly the overall score of the
> benchmark) How many times did you run the benchmark?
>
2% to be exact, other stats are also present in commit message, the benchmark
was run at least 5 times before concluding and more than that during experimenting.
> > When we did the analysis of tests we were running, we found
> > brw_upload_pipeline_state->check_state functions having bad CPI rates
> > and hence we made changed there. The intention was always to reduce
> > driver overhead, although this is miniscule effort.
> >
> >> Did you do some sort of significance testing? Which test,
> >> significance level and sample size did you use?
> >
> > Sorry this is not something we have done, we tested on android
> > functionality and perf only. Performance benchmark 3dmark and overall
> > stability of the android system were used as tests. Kindly let us know
> > if you have specific tests to be run and we would be happy to run that.
> >
> > What CPU did you get the numbers on?
> >
> > Broxton.
> >
> >>
> >> Thank you.
> >>
> >> > Signed-off-by: Aravindan Muthukumar
> >> > <aravindan.muthukumar at intel.com>
> >> > Signedd-off-by: Yogesh Marathe <yogesh.marathe at intel.com>
> >> > Tested-by: Asish <asish at intel.com>
> >> > ---
> >> >
> >> > Changes since V1:
> >> > - Removed memset() change
> >> > - Changed commit message as per review comments
> >> >
> >> > src/mesa/drivers/dri/i965/brw_defines.h | 4 ++++
> >> > src/mesa/drivers/dri/i965/brw_state_upload.c | 12 ++++++++----
> >> > 2 files changed, 12 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> >> > b/src/mesa/drivers/dri/i965/brw_defines.h
> >> > index 2a8dbf8..8c9a510 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> >> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> >> > @@ -1687,3 +1687,7 @@ enum brw_pixel_shader_coverage_mask_mode
> { #
> >> > define CSDBG2_CONSTANT_BUFFER_ADDRESS_OFFSET_DISABLE (1 << 4)
> >> >
> >> > #endif
> >> > +
> >> > +/* Checking the state of mesa and brw before emitting atoms */
> >> > +#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw))
> >> > +
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
> >> > b/src/mesa/drivers/dri/i965/brw_state_upload.c
> >> > index acaa97e..1c8b969 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> >> > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> >> > @@ -443,10 +443,8 @@ check_and_emit_atom(struct brw_context *brw,
> >> > struct brw_state_flags *state,
> >> > const struct brw_tracked_state *atom) {
> >> > - if (check_state(state, &atom->dirty)) {
> >> > atom->emit(brw);
> >> > merge_ctx_state(brw, state);
> >> > - }
> >> > }
> >> >
> >> > static inline void
> >> > @@ -541,7 +539,10 @@ brw_upload_pipeline_state(struct brw_context
> *brw,
> >> > const struct brw_tracked_state *atom = &atoms[i];
> >> > struct brw_state_flags generated;
> >> >
> >> > - check_and_emit_atom(brw, &state, atom);
> >> > + /* Checking the state and emitting atoms */
> >> > + if (CHECK_BRW_STATE(state, atom->dirty)) {
> >> > + check_and_emit_atom(brw, &state, atom);
> >> > + }
> >> >
> >> > accumulate_state(&examined, &atom->dirty);
> >> >
> >> > @@ -558,7 +559,10 @@ brw_upload_pipeline_state(struct brw_context
> *brw,
> >> > for (i = 0; i < num_atoms; i++) {
> >> > const struct brw_tracked_state *atom = &atoms[i];
> >> >
> >> > - check_and_emit_atom(brw, &state, atom);
> >> > + /* Checking the state and emitting atoms */
> >> > + if (CHECK_BRW_STATE(state, atom->dirty)) {
> >> > + check_and_emit_atom(brw, &state, atom);
> >> > + }
> >> > }
> >> > }
> >> >
> >> > --
> >> > 2.7.4
> >> >
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list