[Mesa-dev] [PATCH 01/37] i965/gs: Use single dispatch mode as fallback to dual object mode when possible.

Jordan Justen jljusten at gmail.com
Thu Sep 4 00:46:24 PDT 2014


On Thu, Sep 4, 2014 at 12:36 AM, Iago Toral Quiroga <itoral at igalia.com> wrote:
> On jue, 2014-09-04 at 00:07 -0700, Jordan Justen wrote:
>> On Wed, Sep 3, 2014 at 11:28 PM, Iago Toral Quiroga <itoral at igalia.com> wrote:
>> > On mié, 2014-09-03 at 17:49 -0700, Jordan Justen wrote:
>> >> On Thu, Aug 14, 2014 at 4:11 AM, Iago Toral Quiroga <itoral at igalia.com> wrote:
>> >> > Currently, when a geometry shader can't use dual object mode we fall back to
>> >> > dual instance mode, however, when invocations == 1, single dispatch mode is
>> >> > more performant and equally efficient in terms of register pressure.
>> >> >
>> >> > Single dispatch mode requires that the driver can handle interleaving of
>> >> > registers, but this is already supported (dual instance mode has the same
>> >> > requirement).
>>
>> Maybe this paragraph could use a minor clarification too? (Of what is
>> required vs. optional. And the fact that we're not doing the optional
>> one.)
>
> Sure. How about this?:
>
> Single dispatch mode requires that the driver can handle interleaving of
> input registers, but this is already supported (dual instance mode has
> the same requirement). However, to take full advantage of single
> dispatch mode to reduce register pressure we would also need to do
> interleaved outputs, but currently, the vec4 visitor and generator
> classes do not support this, so at the moment register pressure in
> single and dual instance modes is the same.
>
>> >> > ---
>> >> >  src/mesa/drivers/dri/i965/brw_context.h           |  8 ++++---
>> >> >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 26 +++++++++++------------
>> >> >  src/mesa/drivers/dri/i965/gen7_gs_state.c         |  4 +---
>> >> >  src/mesa/drivers/dri/i965/gen8_gs_state.c         |  4 +---
>> >> >  4 files changed, 20 insertions(+), 22 deletions(-)
>> >> >
>> >> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
>> >> > index 1bbcf46..7439da1 100644
>> >> > --- a/src/mesa/drivers/dri/i965/brw_context.h
>> >> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
>> >> > @@ -587,10 +587,12 @@ struct brw_gs_prog_data
>> >> >     int invocations;
>> >> >
>> >> >     /**
>> >> > -    * True if the thread should be dispatched in DUAL_INSTANCE mode, false if
>> >> > -    * it should be dispatched in DUAL_OBJECT mode.
>> >> > +    * Dispatch mode, can be any of:
>> >> > +    * GEN7_GS_DISPATCH_MODE_DUAL_OBJECT
>> >> > +    * GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE
>> >> > +    * GEN7_GS_DISPATCH_MODE_SINGLE
>> >> >      */
>> >> > -   bool dual_instanced_dispatch;
>> >> > +   int dispatch_mode;
>> >> >  };
>> >> >
>> >> >  /** Number of texture sampler units */
>> >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>> >> > index b7995ad..c2a4892 100644
>> >> > --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>> >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
>> >> > @@ -101,10 +101,11 @@ vec4_gs_visitor::setup_payload()
>> >> >  {
>> >> >     int attribute_map[BRW_VARYING_SLOT_COUNT * MAX_GS_INPUT_VERTICES];
>> >> >
>> >> > -   /* If we are in dual instanced mode, then attributes are going to be
>> >> > -    * interleaved, so one register contains two attribute slots.
>> >> > +   /* If we are in dual instanced or single mode, then attributes are going
>> >> > +    * to be interleaved, so one register contains two attribute slots.
>> >> >      */
>> >> > -   int attributes_per_reg = c->prog_data.dual_instanced_dispatch ? 2 : 1;
>> >> > +   int attributes_per_reg =
>> >> > +      c->prog_data.dispatch_mode == GEN7_GS_DISPATCH_MODE_DUAL_OBJECT ? 1 : 2;
>> >> >
>> >> >     /* If a geometry shader tries to read from an input that wasn't written by
>> >> >      * the vertex shader, that produces undefined results, but it shouldn't
>> >> > @@ -129,8 +130,7 @@ vec4_gs_visitor::setup_payload()
>> >> >
>> >> >     reg = setup_varying_inputs(reg, attribute_map, attributes_per_reg);
>> >> >
>> >> > -   lower_attributes_to_hw_regs(attribute_map,
>> >> > -                               c->prog_data.dual_instanced_dispatch);
>> >> > +   lower_attributes_to_hw_regs(attribute_map, attributes_per_reg > 1);
>> >> >
>> >> >     this->first_non_payload_grf = reg;
>> >> >  }
>> >> > @@ -640,7 +640,7 @@ brw_gs_emit(struct brw_context *brw,
>> >> >      */
>> >> >     if (c->prog_data.invocations <= 1 &&
>> >> >         likely(!(INTEL_DEBUG & DEBUG_NO_DUAL_OBJECT_GS))) {
>> >> > -      c->prog_data.dual_instanced_dispatch = false;
>> >> > +      c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT;
>> >> >
>> >> >        vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */);
>> >> >        if (v.run()) {
>> >> > @@ -652,15 +652,15 @@ brw_gs_emit(struct brw_context *brw,
>> >> >
>> >> >     /* Either we failed to compile in DUAL_OBJECT mode (probably because it
>> >> >      * would have required spilling) or DUAL_OBJECT mode is disabled.  So fall
>> >> > -    * back to DUAL_INSTANCED mode, which consumes fewer registers.
>> >> > +    * back to DUAL_INSTANCED or SINGLE mode, which consumes fewer registers.
>> >> >      *
>> >> > -    * FIXME: In an ideal world we'd fall back to SINGLE mode, which would
>> >> > -    * allow us to interleave general purpose registers (resulting in even less
>> >> > -    * likelihood of spilling).  But at the moment, the vec4 generator and
>> >> > -    * visitor classes don't have the infrastructure to interleave general
>> >> > -    * purpose registers, so DUAL_INSTANCED is the best we can do.
>> >>
>> >> I think Paul's note is still somewhat valid. Updates to the code gen
>> >> could lead to improved register usage with single mode, right?
>> >>
>> >> So, we'll use single mode, but with the same register usage as dual instance.
>> >
>> > Oh, I got confused with this because I thought he was talking about the
>> > inputs for some reason, which are interleaved in both cases. I don't
>> > know why I got confused, since we clearly mentions the generator
>> > classes. But I see now that he is talking about the output.
>> >
>> > I'll keep the FIXME part of the comment.
>>
>> I do think it needs a tweak, such as:
>> FIXME: Although we use SINGLE mode, we use the same registers as
>> DUAL_INSTANCED mode. To improve SINGLE mode the vec4 generator and
>> visitor classes would need to be updated to take advantage of the
>> interleaving possible with general purpose registers in SINGLE mode.
>>
>> Or, maybe you have a better suggestion?
>
> I think this is fine, but if you like my suggestion for the commit
> message maybe we can simply use the same description here.

Yep, that sounds good.

-Jordan

>> >> > +    * SINGLE mode is more performant when invocations == 1 and DUAL_INSTANCE
>> >> > +    * mode is more performant when invocations > 1.
>> >>
>> >> Maybe site the PRM?
>> >>
>> >> From the Ivy Bridge PRM, Vol2 Part1 7.2.1.1 "3DSTATE_GS"
>> >> "When InstanceCount=1 (one instance per object) software can decide
>> >> which dispatch mode to use. DUAL_OBJECT mode would likely be the best
>> >> choice for performance, followed by SINGLE mode."
>> >
>> > Sure, I all add the reference.
>> >
>> > So with these two changes, should I assume that the patch is reviewed by
>> > you?
>>
>> Yep.
>>
>> -Jordan
>>
>> >>
>> >> >      */
>> >> > -   c->prog_data.dual_instanced_dispatch = true;
>> >> > +   if (c->prog_data.invocations <= 1)
>> >> > +      c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_SINGLE;
>> >> > +   else
>> >> > +      c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE;
>> >> >
>> >> >     vec4_gs_visitor v(brw, c, prog, mem_ctx, false /* no_spills */);
>> >> >     if (!v.run()) {
>> >> > diff --git a/src/mesa/drivers/dri/i965/gen7_gs_state.c b/src/mesa/drivers/dri/i965/gen7_gs_state.c
>> >> > index 93f48f6..b3b4ee6 100644
>> >> > --- a/src/mesa/drivers/dri/i965/gen7_gs_state.c
>> >> > +++ b/src/mesa/drivers/dri/i965/gen7_gs_state.c
>> >> > @@ -145,9 +145,7 @@ upload_gs_state(struct brw_context *brw)
>> >> >            GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT) |
>> >> >           ((brw->gs.prog_data->invocations - 1) <<
>> >> >            GEN7_GS_INSTANCE_CONTROL_SHIFT) |
>> >> > -         (brw->gs.prog_data->dual_instanced_dispatch ?
>> >> > -          GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE :
>> >> > -          GEN7_GS_DISPATCH_MODE_DUAL_OBJECT) |
>> >> > +         brw->gs.prog_data->dispatch_mode |
>> >> >           GEN6_GS_STATISTICS_ENABLE |
>> >> >           (brw->gs.prog_data->include_primitive_id ?
>> >> >            GEN7_GS_INCLUDE_PRIMITIVE_ID : 0) |
>> >> > diff --git a/src/mesa/drivers/dri/i965/gen8_gs_state.c b/src/mesa/drivers/dri/i965/gen8_gs_state.c
>> >> > index 446edec..d8debbd 100644
>> >> > --- a/src/mesa/drivers/dri/i965/gen8_gs_state.c
>> >> > +++ b/src/mesa/drivers/dri/i965/gen8_gs_state.c
>> >> > @@ -83,9 +83,7 @@ gen8_upload_gs_state(struct brw_context *brw)
>> >> >        OUT_BATCH(((brw->max_gs_threads / 2 - 1) << HSW_GS_MAX_THREADS_SHIFT) |
>> >> >                  (brw->gs.prog_data->control_data_header_size_hwords <<
>> >> >                   GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT) |
>> >> > -                (brw->gs.prog_data->dual_instanced_dispatch ?
>> >> > -                 GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE :
>> >> > -                 GEN7_GS_DISPATCH_MODE_DUAL_OBJECT) |
>> >> > +                brw->gs.prog_data->dispatch_mode |
>> >> >                  GEN6_GS_STATISTICS_ENABLE |
>> >> >                  (brw->gs.prog_data->include_primitive_id ?
>> >> >                   GEN7_GS_INCLUDE_PRIMITIVE_ID : 0) |
>> >> > --
>> >> > 1.9.1
>> >> >
>> >> > _______________________________________________
>> >> > mesa-dev mailing list
>> >> > mesa-dev at lists.freedesktop.org
>> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> >>
>> >
>> >
>>
>
>


More information about the mesa-dev mailing list