<div dir="ltr">On 9 September 2013 09:51, Ian Romanick <<a href="mailto:idr@freedesktop.org" target="_blank">idr@freedesktop.org</a>> wrote: <div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="im">On 09/03/2013 06:18 PM, Paul Berry wrote: > GL 3.2 requires us to support 128 varying components for geometry > shader outputs and fragment shader inputs, and 64 varying components > otherwise. But there's no hardware limitation that restricts us to 64 > varying components, and core Mesa doesn't currently allow different > stages to have different maximum values, so I've gone ahead and > enabled 128 varying components for all stages. This has the advantage </div>I was just looking at this today while working on the standalone compiler. To use the standalone compiler for shader validation, we want to advertise the minimums required by the spec. To do that, we need to be able to track the input/output limits separately. Since the varying limit changed from 64 to 60, but the vertex shader output limit is still 64 (where gl_Position is counted?), this may be annoying to implement fully. For the standalone compiler work, I'll add some of this plumbing. That may impact some of your changes, depending on the order things land. Since my patches depend on Ken's built-in rework, yours will almost surely go first. <div><div class="h5"> > of increased test coverage, since piglit already has a number of tests > to validate that the maximum advertised number of varying components > can be exchanged between VS and FS. I've also gone ahead and > increased the limit for gen6 as well as gen7, since it required very > little extra work. > > Previously, on gen6+, we relied on the SF/SBE stage of the pipeline to > reorder the outputs from the GS (or VS) to match the input ordering > required by the FS. This allowed us to determine the order of FS > inputs solely based on the FS, so we avoided recompiles when separate > shader objects were in use. But there's a problem with that: the > SF/SBE stage can't arbitrarily reorder more than 16 VUE slots (1 slot > = 4 varying components). To avoid introducing additional recompiles > with previously-supported shaders, I've taken a hybrid approach to > choosing the FS input ordering: if the FS uses 16 or fewer input > varying slots, then it orders them solely based on its own > requirements. If it uses more than 16 input varying slots, then it > orders them according to the GS (or VS) output VUE map, so that the > SF/SBE stage doesn't have to do any reordering. > > Patches 1-3 modify the FS so that it exposes the order of input > varyings it needs via prog_data. > > Patches 4-6 modify the SF/SBE setup so that it consults the FS > prog_data when choosing how to re-order varyings (previously, it > implicitly assumed an order that happened to match the order the FS > was using). > > Patch 7 is a minor optimization made possible by patches 1-6: now that > the SF/SBE setup no longer makes implicit assumptions about the order > of the FS inputs, the FS no longer has to have dummy input slots for > gl_FragCoord and gl_FrontFacing. </div></div>\o/ <div class="im"> > Patch 8 tweaks the VUE map slightly so that it is uniquely determined > by a single 64-bit bitfield. This will allow us to store the bitfield > in the FS program key rather than the entire VUE map. > > Patch 9 is a minor optimization made possible by patch 8: now that the > VUE map is uniquely determined by a single 64-bit bitfield, we no > longer have to store the entire VUE map in the GS program key. > > Patches 10-11 modify the FS to order its inputs according to the GS > (or VS) output VUE map when there are more than 16 input slots in use. > > Patch 12 adjusts the VS and GS code so that it can output all 32 > varyings to the VUE, even if it requires more than two URB writes to > do so. > > Patches 13-14 make some minor gen6-specific adjustments to allow for > the larger URB entries needed for 32 vayings: the Gen6 transform > feedback code sometimes needs to do 2 URB writes instead of 1, and an > incorrect assertion in the gen6 URB setup needs to be fixed. > > Patch 15 increases the value of MaxVarying from 16 to 32 for gen6+. > > The series is available on branch "increase-max-varyings" of > <a href="https://github.com/stereotype441/mesa.git" target="_blank">https://github.com/stereotype441/mesa.git</a>. I've piglit tested it on > gen5, gen6, and gen7. </div>Do we have tests that use more than 16 varying vectors? Some of the generated varying packing tests, right? </blockquote><div> </div><div>Yes, we have a number of varying packing tests that exercise this (though they aren't generated tests, IIRC). Also, spec/EXT_transform_feedback/max-varyings and shaders/glsl-max-varyings. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div class="im"> > [PATCH 01/15] i965/fs: Expose "urb_setup" as part of brw_wm_prog_data. > [PATCH 02/15] i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs. > [PATCH 03/15] i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state. > [PATCH 04/15] i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values. > [PATCH 05/15] i965/sf: Consolidate common code for setting up gen6-7 attribute overrides. > [PATCH 06/15] i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state. > [PATCH 07/15] i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_Frontfacing. > [PATCH 08/15] i965/gen6+: Remove VUE map dependency on userclip_active. > [PATCH 09/15] i965/gs: Stop storing an input VUE map in the GS program key. > [PATCH 10/15] i965/fs: Simplify computation of key.input_slots_valid during precompile. > [PATCH 11/15] i965/fs: When >64 input components, order them to match prev pipeline stage. > [PATCH 12/15] i965/vec4: Generate URB writes using a loop. > [PATCH 13/15] i965/gen6: Fix assertions on VS/GS URB size. > [PATCH 14/15] i965/ff_gs: Generate URB writes using a loop. > [PATCH 15/15] i965/gen6+: Support 128 varying components. </div>> _______________________________________________ > mesa-dev mailing list > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> > <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> > </blockquote></div> </div></div>