[Mesa-dev] [PATCH 0/8] EXT_transform_feedback & ARB_transform_feedback2 core support
Jose Fonseca
jfonseca at vmware.com
Tue Dec 6 02:07:02 PST 2011
----- Original Message -----
> On 12/05/2011 08:02 PM, Jose Fonseca wrote:
> >
> >
> > ----- Original Message -----
> >> On 11/19/2011 09:44 PM, Marek Olšák wrote:
> >>> Hi everyone,
> >>>
> >>> this patch series implements all the core Mesa and Gallium
> >>> support
> >>> for EXT_transform_feedback and ARB_transform_feedback2. It's been
> >>> tested by me on Radeons and by Christoph Bumiller on Nouveau. I
> >>> have verified that all transform feedback piglit tests (except
> >>> the
> >>> one that requires GLSL 1.3) pass with this code.
> >>>
> >>> Since we were discussing this last time, the Gallium interface
> >>> has
> >>> been slightly modified to make it easier to implement by drivers.
> >>> The Gallium docs have been updated too.
> >>>
> >>> Some streamout-related softpipe and llvmpipe code for the old
> >>> interface has been disabled. I didn't look at what needs to be
> >>> done to support the new one.
> >>>
> >>> Please review.
> >>>
> >>
> >>
> >> No one cares ? Could we just push these now ?
> >
> > I'm not the ideal person to review (I was not involved in the
> > original gallium implementation nor previous discussions about,
> > and I don't know much about transform feedback in general), but I
> > went through the changes FWIW.
> >
> >
> >> gallium: interface changes necessary to implement transform
> >> feedback (v4)
> > [...]
> >
> >> The most notable change is the stream-out info must be linked
> >> with a vertex or geometry shader and cannot be set
> >> independently.
> >> This is due to limitations of existing hardware (special shader
> >> instructions must be used to write into stream-out buffers),
> >> and it's also how OpenGL works (stream outputs must be
> >> specified
> >> prior to linking shaders).
> >
> > Is this true for all current gallium hardware?
> >
>
> To varying degree, I'd say.
> On nv50 and nvc0, stream output state can be set independently, but
> it
> would have to be linked ad-hoc, since if state wasn't created
> together
> with the shader you wouldn't know the hardware output slots that have
> been chosen.
>
> In D3D10/11, stream output state is always specified through a
> geometry
> shader, but you can pass the code of a vertex shader on creation of
> the
> GS and then the implementation can either create a pass-through GS,
> or,
> if it can do stream output without GS active, just treat it as a
> stream
> output state object associated with a specific vertex shader.
> Either way, stream output is always coupled with a shader object,
> too.
>
> For Radeons you have to add extra instructions (depending on the SO
> state) to the shader if it's supposed to stream output, so the OpenGL
> way seems nicer for them.
Thanks for the detailed explanation.
> I'd avoid having to rebuild/modify shader code on the fly wherever
> possible, and using a passthrough GS probably costs enough
> performance
> to avoid it as well.
Yes, makes sense.
> >
> > @@ -164,9 +170,28 @@ struct pipe_clip_state
> > };
> >
> >
> > +/**
> > + * Stream output for vertex transform feedback.
> > + */
> > +struct pipe_stream_output_info
> > +{
> > + /** number of the output buffer to insert each element into */
> > + unsigned output_buffer[PIPE_MAX_SHADER_OUTPUTS];
> > + /** which register OUT[i] to grab each output from */
> > + unsigned register_index[PIPE_MAX_SHADER_OUTPUTS];
> > + /** TGSI_WRITEMASK signifying which components to output */
> > + ubyte register_mask[PIPE_MAX_SHADER_OUTPUTS];
> > + /** number of outputs */
> > + unsigned num_outputs;
> >
> >
> > + /** stride for an entire vertex, only used if all
> > output_buffers are 0 */
> > + unsigned stride;
> > +};
> > +
> >
> > This structure layout could be more space efficient. May be:
> >
> > unsigned stride;
> > unsigned num_outputs;
num_outputs should be first actually, as I suppose stride is irrelevant when num_outputs == 0.
> > struct foo {
> > unsigned output_buffer:???;
> > unsigned register_index:???;
> > unsigned register_mask:4;
> > } [PIPE_MAX_SHADER_OUTPUTS];
> >
> > Especially if less than 30bit ints are sufficient for
> > register_index and output_bffer. This would allow to easily
> > copy/hash only a portion of the state.
> >
>
> 2 bits for buffer and 5 for register_index would satisfy
> PIPE_MAX_OUTPUTS (32) and PIPE_MAX_SO_BUFFERS (4), so we're a long
> way
> from exhausting all 32 bits.
This should be done then.
> > +
> > struct pipe_shader_state
> > {
> > const struct tgsi_token *tokens;
> > + struct pipe_stream_output_info stream_output;
> > };
> >
> > Is this really necessary for all shader types? Would it be enough
> > to mandate geometry shader, and add a new parameter to
> > create_gs_state? Or maybe create a separate pipe_shader_state
> > structure for this. Either wat looks cleaner and more
> > light-weight than taxing all current pipe_shader_state users with
> > this stage specific info.
> >
>
> Necessary for all but COMPUTE and FRAGMENT (so, for VERTEX, HULL,
> DOMAIN
> and GEOMETRY, though I haven't checked whether nvc0 can actually run
> with only a single one of HULL/DOMAIN).
OK. That's the majority.
In that case I'm not sure it's worth to do things differently for fragment/compute anymore. So let's leave it at this.
> > BTW, there was at one point the plan to remove "struct
> > pipe_shader_state", and have shader state to be simply "struct
> > tgsi_token *". Even if we drop that plan now, I think that were
> > several places that make this assumption and dropped the contents
> > of pipe_shader_state, so reference to pipe_shader_state needs to
> > be audited.
But we really need to make sure that all pipe_shader_state users fill/preserve the stream_output member, in particular, I noticed these are missing:
- state trackers in src/gallium/state_trackers/... were not updated to zero stream_output member
- src/gallium/auxiliary/postprocess/pp_run.c ditto
- src/gallium/drivers/trace/ & src/gallium/auxiliary/util/u_dump_state.c should be updated to dump the new stream_output member
After these updates and the above compacting of pipe_stream_output_info I'm comfortable with the gallium changes.
And, independently of this change, the (half dead) cso_context code for caching shaders should be axed. It never quite worked, and much less now.
Jose
More information about the mesa-dev
mailing list