[Mesa-dev] [PATCH 02/23] tgsi: add Stream{X, Y, Z, W} fields to tgsi_declaration_semantic

Wed Nov 30 20:37:41 UTC 2016

Am 30.11.2016 um 20:19 schrieb Nicolai Hähnle:
> On 30.11.2016 19:06, Roland Scheidegger wrote:
>> Am 30.11.2016 um 14:35 schrieb Nicolai Hähnle:
>>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>>
>>> This is for geometry shader outputs. Without it, drivers have no way of
>>> knowing which stream each output is intended for, and have to
>>> conservatively write all outputs to all streams.
>>>
>>> Separate stream numbers for each component are required due to output
>>> packing.
>> Are you sure this is true?
>> This is an area I don't know much about, but
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.opengl.org_wiki_Layout-5FQualifier-5F-28GLSL-29&d=DgIDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=fVpTGTYN2KTEhU17RpFTxEULrsIfC3bdpEin0k8NIYE&s=uamnHj-9Xr12ctr0gHDfCMIMHq8DyUBtKIwHQQpjDLs&e= 
>> tells me "Stream
>> assignments for a geometry shader are required to be the same for all
>> members of a block, but offsets are not."
>>
>> Therefore I don't think output packing should ever happen across
>> multiple streams. I think it would be MUCH nicer if the semantic needed
>> just one stream member...
> 
> There are two variants of that question, I guess.
> 
> The answer to the first variant is: Yes, this is currently true.
> lower_packed_varyings will happily pack outputs from different vertex
> streams into the same vec4. This affects quite a lot of programs, e.g.
> you see it in piglit arb_gpu_shader5-xfb-streams.
> 
> The second question is: Do we want it to be true? I agree that it would
> be convenient to be able to use a single Stream member. Also, isolating
> the stream0 components from the rest would lead to slightly more
> efficient shaders for us in some cases.
> 
> I opted against it so far because I didn't want to think through the
> implications of changing lower_packed_varyings. The main question I have
> is: if you account for the size of the GS output in # of components,
> then it could happen that the number of output vec4s ends up being
> larger than (max # of output components) / 4. Will that be a problem
> somewhere?

I don't know if that would be a problem, but if it is I'd assume this
would be fixable (since the number of actual components ultimately
doesn't change).
Having outputs belonging to multiple streams in a single output just
seems weird...
That said, I wonder if it actually would be possible to do that with
d3d11 too.
With shader model 5 you'd have:
dcl_stream 0
dcl_output o0.xy
dcl_stream 1
dcl_output o0.zw // legal or not???

Though the shader model 4/5 rules are a bit weird for packing
inputs/outputs, I'm not even sure two dcl_output are legal for the same
reg without a dcl_stream in between them (but you can pack system values
together with ordinary inputs/outputs).

So maybe just allowing this is the right solution...

Roland

> 
> Nicolai
> 
>>
>> Roland
>>
>>
>>
>>> ---
>>>  src/compiler/glsl/ir_print_visitor.cpp     |  4 +--
>>>  src/gallium/auxiliary/tgsi/tgsi_build.c    | 18 +++++++++--
>>>  src/gallium/auxiliary/tgsi/tgsi_dump.c     | 13 ++++++++
>>>  src/gallium/auxiliary/tgsi/tgsi_text.c     | 48
>>> ++++++++++++++++++++++++++++++
>>>  src/gallium/include/pipe/p_shader_tokens.h |  5 +++-
>>>  5 files changed, 83 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/compiler/glsl/ir_print_visitor.cpp
>>> b/src/compiler/glsl/ir_print_visitor.cpp
>>> index 2b77c14..d401426 100644
>>> --- a/src/compiler/glsl/ir_print_visitor.cpp
>>> +++ b/src/compiler/glsl/ir_print_visitor.cpp
>>> @@ -173,26 +173,26 @@ void ir_print_visitor::visit(ir_variable *ir)
>>>     if (ir->data.location != -1)
>>>        snprintf(loc, sizeof(loc), "location=%i ", ir->data.location);
>>>
>>>     char component[32] = {0};
>>>     if (ir->data.explicit_component)
>>>        snprintf(component, sizeof(component), "component=%i ",
>>> ir->data.location_frac);
>>>
>>>     char stream[32] = {0};
>>>     if (ir->data.stream & (1u << 31)) {
>>>        if (ir->data.stream & ~(1u << 31)) {
>>> -         snprintf(stream, sizeof(stream), "stream(%u,%u,%u,%u)",
>>> +         snprintf(stream, sizeof(stream), "stream(%u,%u,%u,%u) ",
>>>                    ir->data.stream & 3, (ir->data.stream >> 2) & 3,
>>>                    (ir->data.stream >> 4) & 3, (ir->data.stream >> 6)
>>> & 3);
>>>        }
>>>     } else if (ir->data.stream) {
>>> -      snprintf(stream, sizeof(stream), "stream%u", ir->data.stream);
>>> +      snprintf(stream, sizeof(stream), "stream%u ", ir->data.stream);
>>>     }
>>>
>>>     const char *const cent = (ir->data.centroid) ? "centroid " : "";
>>>     const char *const samp = (ir->data.sample) ? "sample " : "";
>>>     const char *const patc = (ir->data.patch) ? "patch " : "";
>>>     const char *const inv = (ir->data.invariant) ? "invariant " : "";
>>>     const char *const prec = (ir->data.precise) ? "precise " : "";
>>>     const char *const mode[] = { "", "uniform ", "shader_storage ",
>>>                                  "shader_shared ", "shader_in ",
>>> "shader_out ",
>>>                                  "in ", "out ", "inout ",
>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c
>>> b/src/gallium/auxiliary/tgsi/tgsi_build.c
>>> index d525c8f..773f892 100644
>>> --- a/src/gallium/auxiliary/tgsi/tgsi_build.c
>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
>>> @@ -232,40 +232,50 @@ tgsi_build_declaration_interp(unsigned
>>> interpolate,
>>>     return di;
>>>  }
>>>
>>>  static struct tgsi_declaration_semantic
>>>  tgsi_default_declaration_semantic( void )
>>>  {
>>>     struct tgsi_declaration_semantic ds;
>>>
>>>     ds.Name = TGSI_SEMANTIC_POSITION;
>>>     ds.Index = 0;
>>> -   ds.Padding = 0;
>>> +   ds.StreamX = 0;
>>> +   ds.StreamY = 0;
>>> +   ds.StreamZ = 0;
>>> +   ds.StreamW = 0;
>>>
>>>     return ds;
>>>  }
>>>
>>>  static struct tgsi_declaration_semantic
>>>  tgsi_build_declaration_semantic(
>>>     unsigned semantic_name,
>>>     unsigned semantic_index,
>>> +   unsigned streamx,
>>> +   unsigned streamy,
>>> +   unsigned streamz,
>>> +   unsigned streamw,
>>>     struct tgsi_declaration *declaration,
>>>     struct tgsi_header *header )
>>>  {
>>>     struct tgsi_declaration_semantic ds;
>>>
>>>     assert( semantic_name <= TGSI_SEMANTIC_COUNT );
>>>     assert( semantic_index <= 0xFFFF );
>>>
>>>     ds.Name = semantic_name;
>>>     ds.Index = semantic_index;
>>> -   ds.Padding = 0;
>>> +   ds.StreamX = streamx;
>>> +   ds.StreamY = streamy;
>>> +   ds.StreamZ = streamz;
>>> +   ds.StreamW = streamw;
>>>
>>>     declaration_grow( declaration, header );
>>>
>>>     return ds;
>>>  }
>>>
>>>  static struct tgsi_declaration_image
>>>  tgsi_default_declaration_image(void)
>>>  {
>>>     struct tgsi_declaration_image di;
>>> @@ -454,20 +464,24 @@ tgsi_build_full_declaration(
>>>        struct tgsi_declaration_semantic *ds;
>>>
>>>        if( maxsize <= size )
>>>           return  0;
>>>        ds = (struct tgsi_declaration_semantic *) &tokens[size];
>>>        size++;
>>>
>>>        *ds = tgsi_build_declaration_semantic(
>>>           full_decl->Semantic.Name,
>>>           full_decl->Semantic.Index,
>>> +         full_decl->Semantic.StreamX,
>>> +         full_decl->Semantic.StreamY,
>>> +         full_decl->Semantic.StreamZ,
>>> +         full_decl->Semantic.StreamW,
>>>           declaration,
>>>           header );
>>>     }
>>>
>>>     if (full_decl->Declaration.File == TGSI_FILE_IMAGE) {
>>>        struct tgsi_declaration_image *di;
>>>
>>>        if (maxsize <= size) {
>>>           return  0;
>>>        }
>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c
>>> b/src/gallium/auxiliary/tgsi/tgsi_dump.c
>>> index 614bcb2..f74aad1 100644
>>> --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
>>> @@ -353,20 +353,33 @@ iter_declaration(
>>>     if (decl->Declaration.Semantic) {
>>>        TXT( ", " );
>>>        ENM( decl->Semantic.Name, tgsi_semantic_names );
>>>        if (decl->Semantic.Index != 0 ||
>>>            decl->Semantic.Name == TGSI_SEMANTIC_TEXCOORD ||
>>>            decl->Semantic.Name == TGSI_SEMANTIC_GENERIC) {
>>>           CHR( '[' );
>>>           UID( decl->Semantic.Index );
>>>           CHR( ']' );
>>>        }
>>> +
>>> +      if (decl->Semantic.StreamX != 0 || decl->Semantic.StreamY != 0 ||
>>> +          decl->Semantic.StreamZ != 0 || decl->Semantic.StreamW != 0) {
>>> +         TXT(", STREAM(");
>>> +         UID(decl->Semantic.StreamX);
>>> +         TXT(", ");
>>> +         UID(decl->Semantic.StreamY);
>>> +         TXT(", ");
>>> +         UID(decl->Semantic.StreamZ);
>>> +         TXT(", ");
>>> +         UID(decl->Semantic.StreamW);
>>> +         CHR(')');
>>> +      }
>>>     }
>>>
>>>     if (decl->Declaration.File == TGSI_FILE_IMAGE) {
>>>        TXT(", ");
>>>        ENM(decl->Image.Resource, tgsi_texture_names);
>>>        TXT(", ");
>>>        TXT(util_format_name(decl->Image.Format));
>>>        if (decl->Image.Writable)
>>>           TXT(", WR");
>>>        if (decl->Image.Raw)
>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c
>>> b/src/gallium/auxiliary/tgsi/tgsi_text.c
>>> index be80842..1b4f594 100644
>>> --- a/src/gallium/auxiliary/tgsi/tgsi_text.c
>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
>>> @@ -1539,20 +1539,68 @@ static boolean parse_declaration( struct
>>> translate_ctx *ctx )
>>>                    ctx->cur = cur;
>>>                    break;
>>>                 }
>>>              }
>>>           }
>>>        }
>>>     }
>>>
>>>     cur = ctx->cur;
>>>     eat_opt_white( &cur );
>>> +   if (*cur == ',' &&
>>> +       file == TGSI_FILE_OUTPUT && ctx->processor ==
>>> PIPE_SHADER_GEOMETRY) {
>>> +      cur++;
>>> +      eat_opt_white(&cur);
>>> +      if (str_match_nocase_whole(&cur, "STREAM")) {
>>> +         uint stream[4];
>>> +
>>> +         eat_opt_white(&cur);
>>> +         if (*cur != '(') {
>>> +            report_error(ctx, "Expected '('");
>>> +            return FALSE;
>>> +         }
>>> +         cur++;
>>> +
>>> +         for (int i = 0; i < 4; ++i) {
>>> +            eat_opt_white(&cur);
>>> +            if (!parse_uint(&cur, &stream[i])) {
>>> +               report_error(ctx, "Expected literal integer");
>>> +               return FALSE;
>>> +            }
>>> +
>>> +            eat_opt_white(&cur);
>>> +            if (i < 3) {
>>> +               if (*cur != ',') {
>>> +                  report_error(ctx, "Expected ','");
>>> +                  return FALSE;
>>> +               }
>>> +               cur++;
>>> +            }
>>> +         }
>>> +
>>> +         if (*cur != ')') {
>>> +            report_error(ctx, "Expected ')'");
>>> +            return FALSE;
>>> +         }
>>> +         cur++;
>>> +
>>> +         decl.Semantic.StreamX = stream[0];
>>> +         decl.Semantic.StreamY = stream[1];
>>> +         decl.Semantic.StreamZ = stream[2];
>>> +         decl.Semantic.StreamW = stream[3];
>>> +
>>> +         ctx->cur = cur;
>>> +      }
>>> +   }
>>> +
>>> +   cur = ctx->cur;
>>> +   eat_opt_white( &cur );
>>>     if (*cur == ',' && !is_vs_input) {
>>>        uint i;
>>>
>>>        cur++;
>>>        eat_opt_white( &cur );
>>>        for (i = 0; i < TGSI_INTERPOLATE_COUNT; i++) {
>>>           if (str_match_nocase_whole( &cur, tgsi_interpolate_names[i]
>>> )) {
>>>              decl.Declaration.Interpolate = 1;
>>>              decl.Interp.Interpolate = i;
>>>
>>> diff --git a/src/gallium/include/pipe/p_shader_tokens.h
>>> b/src/gallium/include/pipe/p_shader_tokens.h
>>> index 4a259db..ee59df0 100644
>>> --- a/src/gallium/include/pipe/p_shader_tokens.h
>>> +++ b/src/gallium/include/pipe/p_shader_tokens.h
>>> @@ -200,21 +200,24 @@ enum tgsi_semantic {
>>>     TGSI_SEMANTIC_BASEINSTANCE,
>>>     TGSI_SEMANTIC_DRAWID,
>>>     TGSI_SEMANTIC_WORK_DIM,    /**< opencl get_work_dim value */
>>>     TGSI_SEMANTIC_COUNT,       /**< number of semantic values */
>>>  };
>>>
>>>  struct tgsi_declaration_semantic
>>>  {
>>>     unsigned Name           : 8;  /**< one of TGSI_SEMANTIC_x */
>>>     unsigned Index          : 16; /**< UINT */
>>> -   unsigned Padding        : 8;
>>> +   unsigned StreamX        : 2; /**< vertex stream (for GS output) */
>>> +   unsigned StreamY        : 2;
>>> +   unsigned StreamZ        : 2;
>>> +   unsigned StreamW        : 2;
>>>  };
>>>
>>>  struct tgsi_declaration_image {
>>>     unsigned Resource    : 8; /**< one of TGSI_TEXTURE_ */
>>>     unsigned Raw         : 1;
>>>     unsigned Writable    : 1;
>>>     unsigned Format      : 10; /**< one of PIPE_FORMAT_ */
>>>     unsigned Padding     : 12;
>>>  };
>>>
>>>
>>