[Mesa-dev] [PATCH 2/2] i965/gs: Optimize away the EOT write on Gen8+ with static vertex count.

Matt Turner mattst88 at gmail.com
Fri Sep 25 11:06:32 PDT 2015


On Fri, Sep 25, 2015 at 10:47 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> With static vertex counts, the final EOT write doesn't actually write
> any data - it's just there to end the thread.  Typically, the last
> thing before ending the thread will be an EmitVertex() call, resulting
> in a URB write.  We can just set EOT on that.
>
> Note that this isn't always possible - there might be an intervening
> SSBO write/image store, or the URB write may have been in a loop.
>
> shader-db statistics for geometry shaders only:
>
> total instructions in shared programs: 3173 -> 3149 (-0.76%)
> instructions in affected programs:     176 -> 152 (-13.64%)
> total loops in shared programs:        1 -> 1 (0.00%)

I've been stripping out lines like this, or "HURT: 0" from my stats
when they don't really indicate anything that wasn't expected.

I can patch report.py to simply not print them in those cases if we want.

> helped:                                8
>
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> This requires my previous series.  We could extend this to Gen7 as well...
> we'd just need to move the GS_OPCODE_SET_VERTEX_COUNT above the earlier
> URB write so the message header bits get set.  I just haven't pulled out
> the older laptop.  This didn't seem to help performance in Gl32GSCloth.
> Maybe a tiny bit, but not easily measured.
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> index acf0501..d2edc57 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> @@ -236,6 +236,21 @@ vec4_gs_visitor::emit_thread_end()
>
>     bool static_vertex_count = c->prog_data.static_vertex_count != -1;
>
> +   /* If the previous instruction was a URB write, we don't need to issue
> +    * a second one - we can just set the EOT bit on the previous write.
> +    *
> +    * Skip this on Gen8+ unless there's a static vertex count, as we also
> +    * need to write the vertex count out, and combining the two may not be
> +    * possible (or at least not straightforward).
> +    */
> +   vec4_instruction *last = (vec4_instruction *) instructions.get_tail();
> +   if (last && last->opcode == GS_OPCODE_URB_WRITE &&
> +       !(INTEL_DEBUG & DEBUG_SHADER_TIME) &&
> +       devinfo->gen >= 8 && static_vertex_count) {
> +      last->urb_write_flags = BRW_URB_WRITE_EOT | last->urb_write_flags;

The operand ordering on the RHS made me thing something odd was going
on. Change this to...?

    last->urb_write_flags |= BRW_URB_WRITE_EOT;

With that,

Reviewed-by: Matt Turner <mattst88 at gmail.com>


More information about the mesa-dev mailing list