[Mesa-dev] [PATCH 3/4] i965/state: Create separate dirty state bits for each pipeline

Tue Mar 10 16:32:19 PDT 2015

On 2015-03-10 16:11:08, Kristian Høgsberg wrote:
> On Tue, Mar 10, 2015 at 10:49 AM, Jordan Justen
> <jordan.l.justen at intel.com> wrote:
> > When uploading state for a pipeline, we will save changed state for
> > the other pipelines.
> >
> > Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.h      |  1 +
> >  src/mesa/drivers/dri/i965/brw_state_upload.c | 41 ++++++++++++++++++++++------
> >  2 files changed, 34 insertions(+), 8 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> > index fd43b07..902de18 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1101,6 +1101,7 @@ struct brw_context
> >     GLuint NewGLState;
> >     struct {
> >        struct brw_state_flags dirty;
> > +      struct brw_state_flags pipelines[BRW_NUM_PIPELINES];
> >     } state;
> >
> >     struct brw_cache cache;
> > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c
> > index dbfdc92..6966f06 100644
> > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> > @@ -580,15 +580,16 @@ brw_upload_programs(struct brw_context *brw)
> >     brw_upload_wm_prog(brw);
> >  }
> >
> > -/***********************************************************************
> > - * Emit all state:
> > - */
> > -void brw_upload_render_state(struct brw_context *brw)
> > +static inline void
> > +brw_upload_pipeline_state(struct brw_context *brw,
> > +                          enum brw_pipeline pipeline)
> >  {
> >     struct gl_context *ctx = &brw->ctx;
> >     struct brw_state_flags *state = &brw->state.dirty;
> >     int i;
> >     static int dirty_count = 0;
> > +   struct brw_state_flags *pipeline_state =
> > +      &brw->state.pipelines[pipeline];
> >
> >     state->mesa |= brw->NewGLState;
> >     brw->NewGLState = 0;
> > @@ -627,6 +628,12 @@ void brw_upload_render_state(struct brw_context *brw)
> >        brw->state.dirty.brw |= BRW_NEW_NUM_SAMPLES;
> >     }
> >
> > +   if ((pipeline_state->mesa | pipeline_state->brw) != 0) {
> > +      state->mesa |= pipeline_state->mesa;
> > +      state->brw |= pipeline_state->brw;
> > +      memset(pipeline_state, 0, sizeof(struct brw_state_flags));
> > +   }
> > +
> >     if ((state->mesa | state->brw) == 0)
> >        return;
> >
> > @@ -636,6 +643,9 @@ void brw_upload_render_state(struct brw_context *brw)
> >
> >     brw_upload_programs(brw);
> >
> > +   const struct brw_tracked_state *atoms =
> > +      brw_pipeline_first_atom(brw, pipeline);
> 
> Instead of looping over num_atoms[] to determine the start of the
> compute pipeline atoms, I think it'd be simpler to just have the
> compute atoms be a separate array in struct brw_context.

I showed Ken an earlier version that had this done differently. In
that case, I stored the start/end positions in an array. Ken's
feedback was that it was a little overly complicated.

Regarding cpu time, since this function, and brw_pipeline_first_atom
are inline functions, I think the compiler can figure out that the
atoms for the render pipeline start at brw->atoms[0] without accessing
brw->num_atoms. For the compute version, I think it will only need to
access brw->num_atoms[0], and it ought to do both of these without
looping.

I should confirm in the generated code. :)

> >     if (unlikely(INTEL_DEBUG)) {
> >        /* Debug version which enforces various sanity checks on the
> >         * state flags which are generated and checked to help ensure
> > @@ -645,8 +655,8 @@ void brw_upload_render_state(struct brw_context *brw)
> >        memset(&examined, 0, sizeof(examined));
> >        prev = *state;
> >
> > -      for (i = 0; i < brw->num_atoms[BRW_RENDER_PIPELINE]; i++) {
> > -        const struct brw_tracked_state *atom = &brw->atoms[i];
> > +      for (i = 0; i < brw->num_atoms[pipeline]; i++) {
> 
> This makes the loop condition a more complex expression.  I don't
> think gcc computes and caches brw->num_atoms[pipeline] outside the
> loop so this adds a bit of code to this very hot loop.  I'd try
> something like

Since this function becomes an inline function, I think this change
should still be a single read from the brw->num_atoms[] array. (So,
the same as when supporting a single pipeline with brw->num_atoms
being an int.)

Once again, I'll confirm.

-Jordan