[Mesa-dev] [PATCH v2 3/4] i965/state: Create separate dirty state bits for each pipeline

Tue Mar 17 12:36:57 PDT 2015

On Tue, Mar 17, 2015 at 10:46 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Tuesday, March 17, 2015 08:20:23 AM Kristian Høgsberg wrote:
>> On Tue, Mar 17, 2015 at 2:22 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
>> > On Monday, March 16, 2015 10:21:31 PM Kristian Høgsberg wrote:
>> >> On Wed, Mar 11, 2015 at 11:53 AM, Jordan Justen
>> >> <jordan.l.justen at intel.com> wrote:
> [snip]
>> >> > @@ -691,8 +702,23 @@ void brw_upload_render_state(struct brw_context *brw)
>> >> >          fprintf(stderr, "\n");
>> >> >        }
>> >> >     }
>> >> > +
>> >> > +   /* Save all dirty state into the other pipelines */
>> >> > +   for (int i = 0; i < BRW_NUM_PIPELINES; i++) {
>> >> > +      if (i != pipeline) {
>> >> > +         brw->state.pipelines[i].mesa |= state->mesa;
>> >> > +         brw->state.pipelines[i].brw |= state->brw;
>> >> > +      }
>> >> > +   }
>> >>
>> >> When we merge this in at this point, state->mesa/brw includes dirty
>> >> bits from the active pipelines pipeline flags. For example, suppose we
>> >> do a bunch of render pipeline stuff, which queues up a lot of dirty
>> >> flags for the compute pipeline. Then when we go to emit state for the
>> >> compute pipeline we first merge all these dirty flags into
>> >> brw->state.dirty, then merge that into the render pipeline flags. We
>> >> only need to merge the new dirty bits, that is, the initial value of
>> >> brw->state.dirty into the render pipeline flags. If we don't modify
>> >> brw->state.dirty in this function and merge the flags in clear as
>> >> described above, this problem is easy to fix.
>> >>
>> >> Kristian
>
> OK, I see what you mean now - very good catch.
>
> 1. We do a render operation.  Any new dirty bits get saved in
>    brw->state.pipelines[COMPUTE], so they'll happen when we
>    do the next compute workload.
>
>    This completes successfully (ignore retries).
>
> 2. We do a compute operation.
>
>    brw->state.pipelines[COMPUTE] gets merged into brw->state.dirty.
>    This includes bits that happened during #1 (and were missed on
>    the compute pipe).
>
>    This all works fine.  At the end, we save brw->state.dirty into
>    brw->state.pipelines[RENDER].
>
> 3. We do another render operation.
>
>    brw->state.pipelines[RENDER] gets merged into brw->state.dirty.
>    We now have bits from #1 again, which are not necessary.
>
> So yeah.  That's bad.  Sorry, I thought this was to handle the
> hypothetical case of pipe-switch-during-retries.  But it's just
> switching pipelines at all!

I still think both points are valid. Just because we don't hit the
retry problem doesn't mean we shouldn't fix it.  Yes, there's
precedence for moving the flag bits around in brw_upload_state() even
in case it fails, but that just propagates the bits from where higher
level mesa sets them to where we emit state from them. Conceptually
the state remains the same after brw_upload_state() fails. There is
also precedence for *not* changing state destructively in
brw_upload_state(), that's the reason brw_clear_dirty_bits() is there
in the first place. When we clear brw->state.pipelines[pipline] and
merge the flags into brw->state.dirty, we change the state in a way
that makes it a requirement that we call again with the same pipeline
to get correct behavior. Given that we have the brw_clear_dirty_bits()
in place already and that this is all new code (as opposed to changing
behavior of a piece of battle-tested, stable logic), I really don't
see why we would make it work this way.

> If I understand your suggestion correctly, you mean that
> a) We should eliminate brw->state.dirty.
> b) brw_upload_pipeline_state() should do:
>
>    struct brw_state_flags *pipeline_state =
>       &brw->state.pipelines[pipeline];
>
>    /* Create a local copy of the dirty state */
>    struct brw_state_flags dirty;
>    dirty.mesa = brw->NewGLState | pipeline_state->mesa;
>    dirty.brw = ctx->NewDriverState | pipeline_state->brw;
>
>    Use this for uploads.
>
> c) The existing brw_clear_dirty_bits() should do:
>
>    /* Save the dirty flags into the other pipelines */
>    for (int i = 0; i < BRW_NUM_PIPELINES; i++) {
>       if (i != pipeline) {
>          brw->state.pipelines[i].mesa |= brw->NewGLState;
>          brw->state.pipelines[i].brw |= ctx->NewDriverState;
>       }
>    }
>
>    /* We've handled any bits passed to us by other pipeline uploads */
>    brw->state.pipelines[pipe] = 0;
>
>    /* We've handled the flags passed to us by Mesa */
>    brw->NewGLState = 0;
>    ctx->NewDriverState = 0;
>
> Does that sound like a correct interpretation?  This does seem to solve
> the problem, and I think it's a bit cleaner too.

Yes, pretty much. I wasn't considering eliminating brw->state.dirty,
but that would be nice. We still have atoms that set dirty bits there
as they're emitted, but we can use ctx->NewDriverState for that and or
it into dirty.brw after each atom emit call.  That's a bigger change,
I see around 90 cases of setting bits in brw->state.dirty.brw, but
getting rid of brw->state.dirty may be worth the churn.

> Re-evaluating my example to verify that this approach works...
>
> 1. We do a render operation.
>
>    The local dirty flags are <bits from Mesa> | <saved bits>.
>    Nothing's saved yet, so just <bits from Mesa>
>
>    When done, we save <bits from Mesa> to pipelines[COMPUTE].
>
> 2. We do a compute operation.
>
>    The local dirty flags are <new bits from Mesa> | <saved bits
>    from op #1>.  We use those for upload.
>
>    When done, brw_clear_dirty_bits() saves only <new bits from
>    Mesa> to pipelines[COMPUTE].  Notably, the bits from op #1
>    are /not/ saved.
>
> 3. We do a render operation.
>
>    We get any new bits from Mesa, plus the bits during #2.
>    No bits from #1.
>
> So...you're right - there's definitely a problem, and your suggestion
> totally fixes it.  I just had to work through an example to see what
> you meant.  Thanks again!
>
> Jordan, does this make sense to you?