[Mesa-dev] [PATCH 3/4] i965/state: Create separate dirty state bits for each pipeline

Tue Mar 10 17:09:14 PDT 2015

On Tue, Mar 10, 2015 at 4:32 PM, Jordan Justen
<jordan.l.justen at intel.com> wrote:
> On 2015-03-10 16:11:08, Kristian Høgsberg wrote:
>> On Tue, Mar 10, 2015 at 10:49 AM, Jordan Justen
>> <jordan.l.justen at intel.com> wrote:
>> > When uploading state for a pipeline, we will save changed state for
>> > the other pipelines.
>> >
>> > Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_context.h      |  1 +
>> >  src/mesa/drivers/dri/i965/brw_state_upload.c | 41 ++++++++++++++++++++++------
>> >  2 files changed, 34 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
>> > index fd43b07..902de18 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_context.h
>> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
>> > @@ -1101,6 +1101,7 @@ struct brw_context
>> >     GLuint NewGLState;
>> >     struct {
>> >        struct brw_state_flags dirty;
>> > +      struct brw_state_flags pipelines[BRW_NUM_PIPELINES];
>> >     } state;
>> >
>> >     struct brw_cache cache;
>> > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> > index dbfdc92..6966f06 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
>> > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> > @@ -580,15 +580,16 @@ brw_upload_programs(struct brw_context *brw)
>> >     brw_upload_wm_prog(brw);
>> >  }
>> >
>> > -/***********************************************************************
>> > - * Emit all state:
>> > - */
>> > -void brw_upload_render_state(struct brw_context *brw)
>> > +static inline void
>> > +brw_upload_pipeline_state(struct brw_context *brw,
>> > +                          enum brw_pipeline pipeline)
>> >  {
>> >     struct gl_context *ctx = &brw->ctx;
>> >     struct brw_state_flags *state = &brw->state.dirty;
>> >     int i;
>> >     static int dirty_count = 0;
>> > +   struct brw_state_flags *pipeline_state =
>> > +      &brw->state.pipelines[pipeline];
>> >
>> >     state->mesa |= brw->NewGLState;
>> >     brw->NewGLState = 0;
>> > @@ -627,6 +628,12 @@ void brw_upload_render_state(struct brw_context *brw)
>> >        brw->state.dirty.brw |= BRW_NEW_NUM_SAMPLES;
>> >     }
>> >
>> > +   if ((pipeline_state->mesa | pipeline_state->brw) != 0) {
>> > +      state->mesa |= pipeline_state->mesa;
>> > +      state->brw |= pipeline_state->brw;
>> > +      memset(pipeline_state, 0, sizeof(struct brw_state_flags));
>> > +   }
>> > +
>> >     if ((state->mesa | state->brw) == 0)
>> >        return;
>> >
>> > @@ -636,6 +643,9 @@ void brw_upload_render_state(struct brw_context *brw)
>> >
>> >     brw_upload_programs(brw);
>> >
>> > +   const struct brw_tracked_state *atoms =
>> > +      brw_pipeline_first_atom(brw, pipeline);
>>
>> Instead of looping over num_atoms[] to determine the start of the
>> compute pipeline atoms, I think it'd be simpler to just have the
>> compute atoms be a separate array in struct brw_context.
>
> I showed Ken an earlier version that had this done differently. In
> that case, I stored the start/end positions in an array. Ken's
> feedback was that it was a little overly complicated.
>
> Regarding cpu time, since this function, and brw_pipeline_first_atom
> are inline functions, I think the compiler can figure out that the
> atoms for the render pipeline start at brw->atoms[0] without accessing
> brw->num_atoms. For the compute version, I think it will only need to
> access brw->num_atoms[0], and it ought to do both of these without
> looping.
>
> I should confirm in the generated code. :)

I wasn't concerned about CPU time all that much here, since it's just
once per call.  It's more that it seems awkward to add it up every
time when it never changes.  Something like

  struct {
    int num_atoms;
    const struct brw_tracked_state *atoms;
  } pipelines[2];
  const struct brw_tracked_state render_atoms[57];
  const struct brw_tracked_state atoms[10];

and then initializing pipelines[0]->atoms = &brw->render_atoms[0] and
similar for pipelines[1] seems pretty straightforward.  But I'm fine
with whatever you choose.

>> >     if (unlikely(INTEL_DEBUG)) {
>> >        /* Debug version which enforces various sanity checks on the
>> >         * state flags which are generated and checked to help ensure
>> > @@ -645,8 +655,8 @@ void brw_upload_render_state(struct brw_context *brw)
>> >        memset(&examined, 0, sizeof(examined));
>> >        prev = *state;
>> >
>> > -      for (i = 0; i < brw->num_atoms[BRW_RENDER_PIPELINE]; i++) {
>> > -        const struct brw_tracked_state *atom = &brw->atoms[i];
>> > +      for (i = 0; i < brw->num_atoms[pipeline]; i++) {
>>
>> This makes the loop condition a more complex expression.  I don't
>> think gcc computes and caches brw->num_atoms[pipeline] outside the
>> loop so this adds a bit of code to this very hot loop.  I'd try
>> something like
>
> Since this function becomes an inline function, I think this change
> should still be a single read from the brw->num_atoms[] array. (So,
> the same as when supporting a single pipeline with brw->num_atoms
> being an int.)

I'm wouldn't expect  a big function like  brw_upload_render_state() to
get inlined, unless you use something like the force inline attribute
or whatever that thing is. Anyway, looking at the generated code is
the way to go.

Either way, series

Reviewed-by: Kristian Høgsberg <krh at bitplanet.net>

Kristian