[Mesa-dev] [PATCH 0/8] EXT_transform_feedback & ARB_transform_feedback2 core support

Tue Dec 6 02:07:02 PST 2011

----- Original Message -----
> On 12/05/2011 08:02 PM, Jose Fonseca wrote:
> > 
> > 
> > ----- Original Message -----
> >> On 11/19/2011 09:44 PM, Marek Olšák wrote:
> >>> Hi everyone,
> >>>
> >>> this patch series implements all the core Mesa and Gallium
> >>> support
> >>> for EXT_transform_feedback and ARB_transform_feedback2. It's been
> >>> tested by me on Radeons and by Christoph Bumiller on Nouveau. I
> >>> have verified that all transform feedback piglit tests (except
> >>> the
> >>> one that requires GLSL 1.3) pass with this code.
> >>>
> >>> Since we were discussing this last time, the Gallium interface
> >>> has
> >>> been slightly modified to make it easier to implement by drivers.
> >>> The Gallium docs have been updated too.
> >>>
> >>> Some streamout-related softpipe and llvmpipe code for the old
> >>> interface has been disabled. I didn't look at what needs to be
> >>> done to support the new one.
> >>>
> >>> Please review.
> >>>
> >>
> >>
> >> No one cares ? Could we just push these now ?
> > 
> > I'm not the ideal person to review (I was not involved in the
> > original gallium implementation nor previous discussions about,
> > and I don't know much about transform feedback in general), but I
> > went through the changes FWIW.
> > 
> > 
> >>    gallium: interface changes necessary to implement transform
> >>    feedback (v4)
> > [...]
> > 
> >>    The most notable change is the stream-out info must be linked
> >>    with a vertex or geometry shader and cannot be set
> >>    independently.
> >>    This is due to limitations of existing hardware (special shader
> >>    instructions must be used to write into stream-out buffers),
> >>    and it's also how OpenGL works (stream outputs must be
> >>    specified
> >>    prior to linking shaders).
> > 
> > Is this true for all current gallium hardware?
> > 
> 
> To varying degree, I'd say.
> On nv50 and nvc0, stream output state can be set independently, but
> it
> would have to be linked ad-hoc, since if state wasn't created
> together
> with the shader you wouldn't know the hardware output slots that have
> been chosen.
> 
> In D3D10/11, stream output state is always specified through a
> geometry
> shader, but you can pass the code of a vertex shader on creation of
> the
> GS and then the implementation can either create a pass-through GS,
> or,
> if it can do stream output without GS active, just treat it as a
> stream
> output state object associated with a specific vertex shader.
> Either way, stream output is always coupled with a shader object,
> too.
> 
> For Radeons you have to add extra instructions (depending on the SO
> state) to the shader if it's supposed to stream output, so the OpenGL
> way seems nicer for them.

Thanks for the detailed explanation.

> I'd avoid having to rebuild/modify shader code on the fly wherever
> possible, and using a passthrough GS probably costs enough
> performance
> to avoid it as well.

Yes, makes sense.

> > 
> > @@ -164,9 +170,28 @@ struct pipe_clip_state
> >  };
> >  
> >  
> > +/**
> > + * Stream output for vertex transform feedback.
> > + */
> > +struct pipe_stream_output_info
> > +{
> > +   /** number of the output buffer to insert each element into */
> > +   unsigned output_buffer[PIPE_MAX_SHADER_OUTPUTS];
> > +   /** which register OUT[i] to grab each output from */
> > +   unsigned register_index[PIPE_MAX_SHADER_OUTPUTS];
> > +   /** TGSI_WRITEMASK signifying which components to output */
> > +   ubyte register_mask[PIPE_MAX_SHADER_OUTPUTS];
> > +   /** number of outputs */
> > +   unsigned num_outputs;
> > 
> > 
> > +   /** stride for an entire vertex, only used if all
> > output_buffers are 0 */
> > +   unsigned stride;
> > +};
> > +
> > 
> > This structure layout could be more space efficient. May be:
> > 
> >     unsigned stride;
> >     unsigned num_outputs;

num_outputs should be first actually, as I suppose stride is irrelevant when num_outputs == 0.

> >     struct foo {
> >         unsigned output_buffer:???;
> >         unsigned register_index:???;
> >         unsigned register_mask:4;
> >     } [PIPE_MAX_SHADER_OUTPUTS];
> > 
> > Especially if less than 30bit ints are sufficient for
> > register_index and output_bffer. This would allow to easily
> > copy/hash only a portion of the state.
> > 
> 
> 2 bits for buffer and 5 for register_index would satisfy
> PIPE_MAX_OUTPUTS (32) and PIPE_MAX_SO_BUFFERS (4), so we're a long
> way
> from exhausting all 32 bits.

This should be done then. 

> > +
> >  struct pipe_shader_state
> >  {
> >     const struct tgsi_token *tokens;
> > +   struct pipe_stream_output_info stream_output;
> >  };
> > 
> > Is this really necessary for all shader types? Would it be enough
> > to mandate geometry shader, and add a new parameter to
> > create_gs_state? Or maybe create a separate pipe_shader_state
> > structure for this.  Either wat looks cleaner and more
> > light-weight than taxing all current pipe_shader_state users with
> > this stage specific info.
> > 
> 
> Necessary for all but COMPUTE and FRAGMENT (so, for VERTEX, HULL,
> DOMAIN
> and GEOMETRY, though I haven't checked whether nvc0 can actually run
> with only a single one of HULL/DOMAIN).

OK. That's the majority.

In that case I'm not sure it's worth to do things differently for fragment/compute anymore. So let's leave it at this.

> > BTW, there was at one point the plan to remove "struct
> > pipe_shader_state", and have shader state to be simply "struct
> > tgsi_token *". Even if we drop that plan now, I think that were
> > several places that make this assumption and dropped the contents
> > of pipe_shader_state, so reference to pipe_shader_state needs to
> > be audited.

But we really need to make sure that all pipe_shader_state users fill/preserve the stream_output member, in particular, I noticed these are missing:

- state trackers in src/gallium/state_trackers/...  were not updated to zero stream_output member
- src/gallium/auxiliary/postprocess/pp_run.c ditto
- src/gallium/drivers/trace/ & src/gallium/auxiliary/util/u_dump_state.c should be updated to dump the new stream_output member

After these updates and the above compacting of pipe_stream_output_info I'm comfortable with the gallium changes.

And, independently of this change, the (half dead) cso_context code for caching shaders should be axed. It never quite worked, and much less now.

Jose