<div dir="ltr">On 9 September 2013 09:51, Ian Romanick <span dir="ltr"><<a href="mailto:idr@freedesktop.org" target="_blank">idr@freedesktop.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="im">On 09/03/2013 06:18 PM, Paul Berry wrote:<br>
> GL 3.2 requires us to support 128 varying components for geometry<br>
> shader outputs and fragment shader inputs, and 64 varying components<br>
> otherwise. But there's no hardware limitation that restricts us to 64<br>
> varying components, and core Mesa doesn't currently allow different<br>
> stages to have different maximum values, so I've gone ahead and<br>
> enabled 128 varying components for all stages. This has the advantage<br>
<br>
</div>I was just looking at this today while working on the standalone<br>
compiler. To use the standalone compiler for shader validation, we want<br>
to advertise the minimums required by the spec. To do that, we need to<br>
be able to track the input/output limits separately. Since the varying<br>
limit changed from 64 to 60, but the vertex shader output limit is still<br>
64 (where gl_Position is counted?), this may be annoying to implement fully.<br>
<br>
For the standalone compiler work, I'll add some of this plumbing. That<br>
may impact some of your changes, depending on the order things land.<br>
Since my patches depend on Ken's built-in rework, yours will almost<br>
surely go first.<br>
<div><div class="h5"><br>
> of increased test coverage, since piglit already has a number of tests<br>
> to validate that the maximum advertised number of varying components<br>
> can be exchanged between VS and FS. I've also gone ahead and<br>
> increased the limit for gen6 as well as gen7, since it required very<br>
> little extra work.<br>
><br>
> Previously, on gen6+, we relied on the SF/SBE stage of the pipeline to<br>
> reorder the outputs from the GS (or VS) to match the input ordering<br>
> required by the FS. This allowed us to determine the order of FS<br>
> inputs solely based on the FS, so we avoided recompiles when separate<br>
> shader objects were in use. But there's a problem with that: the<br>
> SF/SBE stage can't arbitrarily reorder more than 16 VUE slots (1 slot<br>
> = 4 varying components). To avoid introducing additional recompiles<br>
> with previously-supported shaders, I've taken a hybrid approach to<br>
> choosing the FS input ordering: if the FS uses 16 or fewer input<br>
> varying slots, then it orders them solely based on its own<br>
> requirements. If it uses more than 16 input varying slots, then it<br>
> orders them according to the GS (or VS) output VUE map, so that the<br>
> SF/SBE stage doesn't have to do any reordering.<br>
><br>
> Patches 1-3 modify the FS so that it exposes the order of input<br>
> varyings it needs via prog_data.<br>
><br>
> Patches 4-6 modify the SF/SBE setup so that it consults the FS<br>
> prog_data when choosing how to re-order varyings (previously, it<br>
> implicitly assumed an order that happened to match the order the FS<br>
> was using).<br>
><br>
> Patch 7 is a minor optimization made possible by patches 1-6: now that<br>
> the SF/SBE setup no longer makes implicit assumptions about the order<br>
> of the FS inputs, the FS no longer has to have dummy input slots for<br>
> gl_FragCoord and gl_FrontFacing.<br>
<br>
</div></div>\o/<br>
<div class="im"><br>
> Patch 8 tweaks the VUE map slightly so that it is uniquely determined<br>
> by a single 64-bit bitfield. This will allow us to store the bitfield<br>
> in the FS program key rather than the entire VUE map.<br>
><br>
> Patch 9 is a minor optimization made possible by patch 8: now that the<br>
> VUE map is uniquely determined by a single 64-bit bitfield, we no<br>
> longer have to store the entire VUE map in the GS program key.<br>
><br>
> Patches 10-11 modify the FS to order its inputs according to the GS<br>
> (or VS) output VUE map when there are more than 16 input slots in use.<br>
><br>
> Patch 12 adjusts the VS and GS code so that it can output all 32<br>
> varyings to the VUE, even if it requires more than two URB writes to<br>
> do so.<br>
><br>
> Patches 13-14 make some minor gen6-specific adjustments to allow for<br>
> the larger URB entries needed for 32 vayings: the Gen6 transform<br>
> feedback code sometimes needs to do 2 URB writes instead of 1, and an<br>
> incorrect assertion in the gen6 URB setup needs to be fixed.<br>
><br>
> Patch 15 increases the value of MaxVarying from 16 to 32 for gen6+.<br>
><br>
> The series is available on branch "increase-max-varyings" of<br>
> <a href="https://github.com/stereotype441/mesa.git" target="_blank">https://github.com/stereotype441/mesa.git</a>. I've piglit tested it on<br>
> gen5, gen6, and gen7.<br>
<br>
</div>Do we have tests that use more than 16 varying vectors? Some of the<br>
generated varying packing tests, right?<br></blockquote><div><br></div><div>Yes, we have a number of varying packing tests that exercise this (though they aren't generated tests, IIRC). Also, spec/EXT_transform_feedback/max-varyings and shaders/glsl-max-varyings.<br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> [PATCH 01/15] i965/fs: Expose "urb_setup" as part of brw_wm_prog_data.<br>
> [PATCH 02/15] i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs.<br>
> [PATCH 03/15] i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state.<br>
> [PATCH 04/15] i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values.<br>
> [PATCH 05/15] i965/sf: Consolidate common code for setting up gen6-7 attribute overrides.<br>
> [PATCH 06/15] i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state.<br>
> [PATCH 07/15] i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_Frontfacing.<br>
> [PATCH 08/15] i965/gen6+: Remove VUE map dependency on userclip_active.<br>
> [PATCH 09/15] i965/gs: Stop storing an input VUE map in the GS program key.<br>
> [PATCH 10/15] i965/fs: Simplify computation of key.input_slots_valid during precompile.<br>
> [PATCH 11/15] i965/fs: When >64 input components, order them to match prev pipeline stage.<br>
> [PATCH 12/15] i965/vec4: Generate URB writes using a loop.<br>
> [PATCH 13/15] i965/gen6: Fix assertions on VS/GS URB size.<br>
> [PATCH 14/15] i965/ff_gs: Generate URB writes using a loop.<br>
> [PATCH 15/15] i965/gen6+: Support 128 varying components.<br>
</div>> _______________________________________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
><br>
<br>
</blockquote></div><br></div></div>