[Mesa-dev] [PATCH 0/8] EXT_transform_feedback & ARB_transform_feedback2 core support

Mon Dec 5 11:34:13 PST 2011

On 12/05/2011 08:02 PM, Jose Fonseca wrote:
> 
> 
> ----- Original Message -----
>> On 11/19/2011 09:44 PM, Marek Olšák wrote:
>>> Hi everyone,
>>>
>>> this patch series implements all the core Mesa and Gallium support
>>> for EXT_transform_feedback and ARB_transform_feedback2. It's been
>>> tested by me on Radeons and by Christoph Bumiller on Nouveau. I
>>> have verified that all transform feedback piglit tests (except the
>>> one that requires GLSL 1.3) pass with this code.
>>>
>>> Since we were discussing this last time, the Gallium interface has
>>> been slightly modified to make it easier to implement by drivers.
>>> The Gallium docs have been updated too.
>>>
>>> Some streamout-related softpipe and llvmpipe code for the old
>>> interface has been disabled. I didn't look at what needs to be
>>> done to support the new one.
>>>
>>> Please review.
>>>
>>
>>
>> No one cares ? Could we just push these now ?
> 
> I'm not the ideal person to review (I was not involved in the original gallium implementation nor previous discussions about, and I don't know much about transform feedback in general), but I went through the changes FWIW.
> 
> 
>>    gallium: interface changes necessary to implement transform feedback (v4)
> [...]
> 
>>    The most notable change is the stream-out info must be linked
>>    with a vertex or geometry shader and cannot be set independently.
>>    This is due to limitations of existing hardware (special shader
>>    instructions must be used to write into stream-out buffers),
>>    and it's also how OpenGL works (stream outputs must be specified
>>    prior to linking shaders).
> 
> Is this true for all current gallium hardware?
> 

To varying degree, I'd say.
On nv50 and nvc0, stream output state can be set independently, but it
would have to be linked ad-hoc, since if state wasn't created together
with the shader you wouldn't know the hardware output slots that have
been chosen.

In D3D10/11, stream output state is always specified through a geometry
shader, but you can pass the code of a vertex shader on creation of the
GS and then the implementation can either create a pass-through GS, or,
if it can do stream output without GS active, just treat it as a stream
output state object associated with a specific vertex shader.
Either way, stream output is always coupled with a shader object, too.

For Radeons you have to add extra instructions (depending on the SO
state) to the shader if it's supposed to stream output, so the OpenGL
way seems nicer for them.
I'd avoid having to rebuild/modify shader code on the fly wherever
possible, and using a passthrough GS probably costs enough performance
to avoid it as well.

> 
> @@ -164,9 +170,28 @@ struct pipe_clip_state
>  };
>  
>  
> +/**
> + * Stream output for vertex transform feedback.
> + */
> +struct pipe_stream_output_info
> +{
> +   /** number of the output buffer to insert each element into */
> +   unsigned output_buffer[PIPE_MAX_SHADER_OUTPUTS];
> +   /** which register OUT[i] to grab each output from */
> +   unsigned register_index[PIPE_MAX_SHADER_OUTPUTS];
> +   /** TGSI_WRITEMASK signifying which components to output */
> +   ubyte register_mask[PIPE_MAX_SHADER_OUTPUTS];
> +   /** number of outputs */
> +   unsigned num_outputs;
> 
> 
> +   /** stride for an entire vertex, only used if all output_buffers are 0 */
> +   unsigned stride;
> +};
> +
> 
> This structure layout could be more space efficient. May be:
> 
>     unsigned stride;    
>     unsigned num_outputs; 
>     struct foo {
>         unsigned output_buffer:???;
>         unsigned register_index:???;
>         unsigned register_mask:4;
>     } [PIPE_MAX_SHADER_OUTPUTS];
> 
> Especially if less than 30bit ints are sufficient for register_index and output_bffer. This would allow to easily copy/hash only a portion of the state.
> 

2 bits for buffer and 5 for register_index would satisfy
PIPE_MAX_OUTPUTS (32) and PIPE_MAX_SO_BUFFERS (4), so we're a long way
from exhausting all 32 bits.

> +
>  struct pipe_shader_state
>  {
>     const struct tgsi_token *tokens;
> +   struct pipe_stream_output_info stream_output;
>  };
> 
> Is this really necessary for all shader types? Would it be enough to mandate geometry shader, and add a new parameter to create_gs_state? Or maybe create a separate pipe_shader_state structure for this.  Either wat looks cleaner and more light-weight than taxing all current pipe_shader_state users with this stage specific info.
> 

Necessary for all but COMPUTE and FRAGMENT (so, for VERTEX, HULL, DOMAIN
and GEOMETRY, though I haven't checked whether nvc0 can actually run
with only a single one of HULL/DOMAIN).

> BTW, there was at one point the plan to remove "struct pipe_shader_state", and have shader state to be simply "struct tgsi_token *". Even if we drop that plan now, I think that were several places that make this assumption and dropped the contents of pipe_shader_state, so reference to pipe_shader_state needs to be audited.
> 
> Jose
> 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>