[Mesa-dev] [PATCH] swr: fix transform feedback logic

Sun Jul 23 18:51:30 UTC 2017

On Sun, Jul 23, 2017 at 12:27 PM, Kyriazis, George
<george.kyriazis at intel.com> wrote:
>
> On Jul 23, 2017, at 11:21 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>
> On Sun, Jul 23, 2017 at 12:08 PM, George Kyriazis
> <george.kyriazis at intel.com> wrote:
>
> The shader that is used to copy vertex data out of the vs/gs shaders to
> the user-specified buffer (streamout os SO shader) was not using the
> correct offsets.
>
> Adjust the offsets that are used just for the SO shader:
> - Make sure that position is handled in the same special way
>  as in the vs/gs shaders
> - Use the correct offset to be passed in the core
> - consolidate register slot mapping logic into one function, since it's
>  been calculated in 2 different places (one for calcuating the slot mask,
>  and one for the register offsets themselves
>
> Also make room for all attibutes in the backend vertex area.
>
> Fixes:
> - all vtk GL2PS tests
> - 18 piglit tests (16 ext_transform_feedback tests,
>  arb-quads-follow-provoking-vertex and primitive-type gl_points
> ---
> src/gallium/drivers/swr/swr_draw.cpp  | 11 ++++++++---
> src/gallium/drivers/swr/swr_state.cpp | 31 +++++++++++++++++++++++++++++--
> src/gallium/drivers/swr/swr_state.h   |  3 +++
> 3 files changed, 40 insertions(+), 5 deletions(-)
>
> diff --git a/src/gallium/drivers/swr/swr_draw.cpp
> b/src/gallium/drivers/swr/swr_draw.cpp
> index 62ad3f7..218de0f 100644
> --- a/src/gallium/drivers/swr/swr_draw.cpp
> +++ b/src/gallium/drivers/swr/swr_draw.cpp
> @@ -26,6 +26,7 @@
> #include "swr_resource.h"
> #include "swr_fence.h"
> #include "swr_query.h"
> +#include "swr_state.h"
> #include "jit_api.h"
>
> #include "util/u_draw.h"
> @@ -81,8 +82,11 @@ swr_draw_vbo(struct pipe_context *pipe, const struct
> pipe_draw_info *info)
>                offsets[output_buffer] = so->output[i].dst_offset;
>             }
>
> +            unsigned attrib_slot = so->output[i].register_index;
> +            attrib_slot = swr_so_adjust_attrib(attrib_slot, ctx->vs);
> +
>             state.stream.decl[num].bufferIndex = output_buffer;
> -            state.stream.decl[num].attribSlot =
> so->output[i].register_index - 1;
> +            state.stream.decl[num].attribSlot = attrib_slot;
>             state.stream.decl[num].componentMask =
>                ((1 << so->output[i].num_components) - 1)
>                << so->output[i].start_component;
> @@ -130,9 +134,10 @@ swr_draw_vbo(struct pipe_context *pipe, const struct
> pipe_draw_info *info)
>    SWR_FRONTEND_STATE feState = {0};
>
>    feState.vsVertexSize =
> -      VERTEX_ATTRIB_START_SLOT +
> +      VERTEX_ATTRIB_START_SLOT
>       + ctx->vs->info.base.num_outputs
> -      - (ctx->vs->info.base.writes_position ? 1 : 0);
> +      - (ctx->vs->info.base.writes_position ? 1 : 0)
> +      + ctx->fs->info.base.num_outputs;

Why do you care about the number of fs outputs in the vertex size?

>
>    if (ctx->rasterizer->flatshade_first) {
>       feState.provokingVertex = {1, 0, 0};
> diff --git a/src/gallium/drivers/swr/swr_state.cpp
> b/src/gallium/drivers/swr/swr_state.cpp
> index 501fdea..3e07929 100644
> --- a/src/gallium/drivers/swr/swr_state.cpp
> +++ b/src/gallium/drivers/swr/swr_state.cpp
> @@ -345,13 +345,15 @@ swr_create_vs_state(struct pipe_context *pipe,
>       // soState.streamToRasterizer not used
>
>       for (uint32_t i = 0; i < stream_output->num_outputs; i++) {
> +         unsigned attrib_slot = stream_output->output[i].register_index;
> +         attrib_slot = swr_so_adjust_attrib(attrib_slot, swr_vs);
>          swr_vs->soState.streamMasks[stream_output->output[i].stream] |=
> -            1 << (stream_output->output[i].register_index - 1);
> +            (1 << attrib_slot);
>       }
>       for (uint32_t i = 0; i < MAX_SO_STREAMS; i++) {
>         swr_vs->soState.streamNumEntries[i] =
>              _mm_popcnt_u32(swr_vs->soState.streamMasks[i]);
> -        swr_vs->soState.vertexAttribOffset[i] = VERTEX_ATTRIB_START_SLOT;
> // TODO: optimize
> +        swr_vs->soState.vertexAttribOffset[i] = 0;
>        }
>    }
>
> @@ -1777,6 +1779,31 @@ swr_update_derived(struct pipe_context *pipe,
>    ctx->dirty = post_update_dirty_flags;
> }
>
> +unsigned
> +swr_so_adjust_attrib(unsigned in_attrib,
> +                     swr_vertex_shader *swr_vs)
> +{
> +   ubyte semantic_name;
> +   unsigned attrib;
> +
> +   attrib = in_attrib + VERTEX_ATTRIB_START_SLOT;
> +
> +   if (swr_vs) {
> +      semantic_name = swr_vs->info.base.output_semantic_name[in_attrib];
> +      if (semantic_name == TGSI_SEMANTIC_POSITION) {
>
>
> But it's not just the position... there are plenty of other attributes
> that need similar treatment. e.g. pointsize, layer, viewport index.
>
>
> Yes, we want to deal with the other attributes, too, but lack the testing
> time, due to the 17.2 schedule.  Since position fixes the vtk tests (that we
> are mostly concerned about), that is what this patch covers.  It’s in our
> plan to deal with the rest of the attributes in the near future (before we
> get distracted).

OK, well your guys's call on how to tackle all this. I somewhat hate
partial solutions, but I realize you have more tangible goals than
'perfection'.

Either way ... doesn't this all break if there's a geometry shader?
You're only ever looking at the VS, not at the last pre-rast stage
(which might be GS, TES, or VS). I believe that TF currently works for
GS, and you're going to be adjusting GS output attrib positions based
on VS output indices. [Note that my change was written before GS was a
thing in swr, so it also doesn't take this into account.]

>
> I tried to fix this a while back but IIRC my solution was left wanting:
>
> https://github.com/imirkin/mesa/commits/swr
>
> (see the WIP commits at the top).
>
> I ended up getting distracted and never completed the work. Feel free
> to copy/ignore/whatever the above commits. I also tried to provide a
> ARB_tf2 impl as well (i.e. pause/resume).
>
> +         attrib = VERTEX_POSITION_SLOT;
> +      } else {
> +         for (int i = 0; i < PIPE_MAX_SHADER_OUTPUTS; i++) {
> +            if (swr_vs->info.base.output_semantic_name[i] ==
> TGSI_SEMANTIC_POSITION) {
> +               attrib--;
> +               break;
> +            }
> +         }
> +      }
> +   }
> +
> +   return attrib;
> +}
>
> static struct pipe_stream_output_target *
> swr_create_so_target(struct pipe_context *pipe,
> diff --git a/src/gallium/drivers/swr/swr_state.h
> b/src/gallium/drivers/swr/swr_state.h
> index 7940a96..8cbd463 100644
> --- a/src/gallium/drivers/swr/swr_state.h
> +++ b/src/gallium/drivers/swr/swr_state.h
> @@ -110,6 +110,9 @@ struct swr_derived_state {
> void swr_update_derived(struct pipe_context *,
>                         const struct pipe_draw_info * = nullptr);
>
> +unsigned swr_so_adjust_attrib(unsigned in_attrib,
> +                              swr_vertex_shader *swr_vs);
> +
> /*
>  * Conversion functions: Convert mesa state defines to SWR.
>  */
> --
> 2.7.4
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>