[Mesa-dev] [PATCH 01/37] i965/gs: Use single dispatch mode as fallback to dual object mode when possible.
Iago Toral Quiroga
itoral at igalia.com
Thu Sep 4 00:36:34 PDT 2014
On jue, 2014-09-04 at 00:07 -0700, Jordan Justen wrote:
> On Wed, Sep 3, 2014 at 11:28 PM, Iago Toral Quiroga <itoral at igalia.com> wrote:
> > On mié, 2014-09-03 at 17:49 -0700, Jordan Justen wrote:
> >> On Thu, Aug 14, 2014 at 4:11 AM, Iago Toral Quiroga <itoral at igalia.com> wrote:
> >> > Currently, when a geometry shader can't use dual object mode we fall back to
> >> > dual instance mode, however, when invocations == 1, single dispatch mode is
> >> > more performant and equally efficient in terms of register pressure.
> >> >
> >> > Single dispatch mode requires that the driver can handle interleaving of
> >> > registers, but this is already supported (dual instance mode has the same
> >> > requirement).
>
> Maybe this paragraph could use a minor clarification too? (Of what is
> required vs. optional. And the fact that we're not doing the optional
> one.)
Sure. How about this?:
Single dispatch mode requires that the driver can handle interleaving of
input registers, but this is already supported (dual instance mode has
the same requirement). However, to take full advantage of single
dispatch mode to reduce register pressure we would also need to do
interleaved outputs, but currently, the vec4 visitor and generator
classes do not support this, so at the moment register pressure in
single and dual instance modes is the same.
> >> > ---
> >> > src/mesa/drivers/dri/i965/brw_context.h | 8 ++++---
> >> > src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 26 +++++++++++------------
> >> > src/mesa/drivers/dri/i965/gen7_gs_state.c | 4 +---
> >> > src/mesa/drivers/dri/i965/gen8_gs_state.c | 4 +---
> >> > 4 files changed, 20 insertions(+), 22 deletions(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> >> > index 1bbcf46..7439da1 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> >> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> >> > @@ -587,10 +587,12 @@ struct brw_gs_prog_data
> >> > int invocations;
> >> >
> >> > /**
> >> > - * True if the thread should be dispatched in DUAL_INSTANCE mode, false if
> >> > - * it should be dispatched in DUAL_OBJECT mode.
> >> > + * Dispatch mode, can be any of:
> >> > + * GEN7_GS_DISPATCH_MODE_DUAL_OBJECT
> >> > + * GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE
> >> > + * GEN7_GS_DISPATCH_MODE_SINGLE
> >> > */
> >> > - bool dual_instanced_dispatch;
> >> > + int dispatch_mode;
> >> > };
> >> >
> >> > /** Number of texture sampler units */
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> >> > index b7995ad..c2a4892 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> >> > @@ -101,10 +101,11 @@ vec4_gs_visitor::setup_payload()
> >> > {
> >> > int attribute_map[BRW_VARYING_SLOT_COUNT * MAX_GS_INPUT_VERTICES];
> >> >
> >> > - /* If we are in dual instanced mode, then attributes are going to be
> >> > - * interleaved, so one register contains two attribute slots.
> >> > + /* If we are in dual instanced or single mode, then attributes are going
> >> > + * to be interleaved, so one register contains two attribute slots.
> >> > */
> >> > - int attributes_per_reg = c->prog_data.dual_instanced_dispatch ? 2 : 1;
> >> > + int attributes_per_reg =
> >> > + c->prog_data.dispatch_mode == GEN7_GS_DISPATCH_MODE_DUAL_OBJECT ? 1 : 2;
> >> >
> >> > /* If a geometry shader tries to read from an input that wasn't written by
> >> > * the vertex shader, that produces undefined results, but it shouldn't
> >> > @@ -129,8 +130,7 @@ vec4_gs_visitor::setup_payload()
> >> >
> >> > reg = setup_varying_inputs(reg, attribute_map, attributes_per_reg);
> >> >
> >> > - lower_attributes_to_hw_regs(attribute_map,
> >> > - c->prog_data.dual_instanced_dispatch);
> >> > + lower_attributes_to_hw_regs(attribute_map, attributes_per_reg > 1);
> >> >
> >> > this->first_non_payload_grf = reg;
> >> > }
> >> > @@ -640,7 +640,7 @@ brw_gs_emit(struct brw_context *brw,
> >> > */
> >> > if (c->prog_data.invocations <= 1 &&
> >> > likely(!(INTEL_DEBUG & DEBUG_NO_DUAL_OBJECT_GS))) {
> >> > - c->prog_data.dual_instanced_dispatch = false;
> >> > + c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT;
> >> >
> >> > vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */);
> >> > if (v.run()) {
> >> > @@ -652,15 +652,15 @@ brw_gs_emit(struct brw_context *brw,
> >> >
> >> > /* Either we failed to compile in DUAL_OBJECT mode (probably because it
> >> > * would have required spilling) or DUAL_OBJECT mode is disabled. So fall
> >> > - * back to DUAL_INSTANCED mode, which consumes fewer registers.
> >> > + * back to DUAL_INSTANCED or SINGLE mode, which consumes fewer registers.
> >> > *
> >> > - * FIXME: In an ideal world we'd fall back to SINGLE mode, which would
> >> > - * allow us to interleave general purpose registers (resulting in even less
> >> > - * likelihood of spilling). But at the moment, the vec4 generator and
> >> > - * visitor classes don't have the infrastructure to interleave general
> >> > - * purpose registers, so DUAL_INSTANCED is the best we can do.
> >>
> >> I think Paul's note is still somewhat valid. Updates to the code gen
> >> could lead to improved register usage with single mode, right?
> >>
> >> So, we'll use single mode, but with the same register usage as dual instance.
> >
> > Oh, I got confused with this because I thought he was talking about the
> > inputs for some reason, which are interleaved in both cases. I don't
> > know why I got confused, since we clearly mentions the generator
> > classes. But I see now that he is talking about the output.
> >
> > I'll keep the FIXME part of the comment.
>
> I do think it needs a tweak, such as:
> FIXME: Although we use SINGLE mode, we use the same registers as
> DUAL_INSTANCED mode. To improve SINGLE mode the vec4 generator and
> visitor classes would need to be updated to take advantage of the
> interleaving possible with general purpose registers in SINGLE mode.
>
> Or, maybe you have a better suggestion?
I think this is fine, but if you like my suggestion for the commit
message maybe we can simply use the same description here.
Iago
> >> > + * SINGLE mode is more performant when invocations == 1 and DUAL_INSTANCE
> >> > + * mode is more performant when invocations > 1.
> >>
> >> Maybe site the PRM?
> >>
> >> From the Ivy Bridge PRM, Vol2 Part1 7.2.1.1 "3DSTATE_GS"
> >> "When InstanceCount=1 (one instance per object) software can decide
> >> which dispatch mode to use. DUAL_OBJECT mode would likely be the best
> >> choice for performance, followed by SINGLE mode."
> >
> > Sure, I all add the reference.
> >
> > So with these two changes, should I assume that the patch is reviewed by
> > you?
>
> Yep.
>
> -Jordan
>
> >>
> >> > */
> >> > - c->prog_data.dual_instanced_dispatch = true;
> >> > + if (c->prog_data.invocations <= 1)
> >> > + c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_SINGLE;
> >> > + else
> >> > + c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE;
> >> >
> >> > vec4_gs_visitor v(brw, c, prog, mem_ctx, false /* no_spills */);
> >> > if (!v.run()) {
> >> > diff --git a/src/mesa/drivers/dri/i965/gen7_gs_state.c b/src/mesa/drivers/dri/i965/gen7_gs_state.c
> >> > index 93f48f6..b3b4ee6 100644
> >> > --- a/src/mesa/drivers/dri/i965/gen7_gs_state.c
> >> > +++ b/src/mesa/drivers/dri/i965/gen7_gs_state.c
> >> > @@ -145,9 +145,7 @@ upload_gs_state(struct brw_context *brw)
> >> > GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT) |
> >> > ((brw->gs.prog_data->invocations - 1) <<
> >> > GEN7_GS_INSTANCE_CONTROL_SHIFT) |
> >> > - (brw->gs.prog_data->dual_instanced_dispatch ?
> >> > - GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE :
> >> > - GEN7_GS_DISPATCH_MODE_DUAL_OBJECT) |
> >> > + brw->gs.prog_data->dispatch_mode |
> >> > GEN6_GS_STATISTICS_ENABLE |
> >> > (brw->gs.prog_data->include_primitive_id ?
> >> > GEN7_GS_INCLUDE_PRIMITIVE_ID : 0) |
> >> > diff --git a/src/mesa/drivers/dri/i965/gen8_gs_state.c b/src/mesa/drivers/dri/i965/gen8_gs_state.c
> >> > index 446edec..d8debbd 100644
> >> > --- a/src/mesa/drivers/dri/i965/gen8_gs_state.c
> >> > +++ b/src/mesa/drivers/dri/i965/gen8_gs_state.c
> >> > @@ -83,9 +83,7 @@ gen8_upload_gs_state(struct brw_context *brw)
> >> > OUT_BATCH(((brw->max_gs_threads / 2 - 1) << HSW_GS_MAX_THREADS_SHIFT) |
> >> > (brw->gs.prog_data->control_data_header_size_hwords <<
> >> > GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT) |
> >> > - (brw->gs.prog_data->dual_instanced_dispatch ?
> >> > - GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE :
> >> > - GEN7_GS_DISPATCH_MODE_DUAL_OBJECT) |
> >> > + brw->gs.prog_data->dispatch_mode |
> >> > GEN6_GS_STATISTICS_ENABLE |
> >> > (brw->gs.prog_data->include_primitive_id ?
> >> > GEN7_GS_INCLUDE_PRIMITIVE_ID : 0) |
> >> > --
> >> > 1.9.1
> >> >
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >
> >
>
More information about the mesa-dev
mailing list