On 12 December 2012 12:53, Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>> wrote: <div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div><div>On Wed, Dec 12, 2012 at 9:21 PM, Eric Anholt <<a href="mailto:eric@anholt.net" target="_blank">eric@anholt.net</a>> wrote: > Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>> writes: > >> On Wed, Dec 12, 2012 at 5:06 PM, Paul Berry <<a href="mailto:stereotype441@gmail.com" target="_blank">stereotype441@gmail.com</a>> wrote: >>> On 11 December 2012 23:49, Aras Pranckevicius <<a href="mailto:aras@unity3d.com" target="_blank">aras@unity3d.com</a>> wrote: >>>> Not sure if relevant for Mesa, but e.g. on PowerVR SGX it's really bad to >>>> pack two vec2 texture coordinates into a single vec4. That's because var.xy >>>> texture read can be "prefetched", whereas <a href="http://var.zw" target="_blank">var.zw</a> texture read is not >>>> prefetched (essentially treated as a dependent texture read), and often >>>> causes stalls in the shader execution. >>> >>> >>> Interesting--I had not thought of that possibility. On i965 all texture >>> reads have to be done explicitly by the fragment shader (there is no >>> prefetching IIRC), so this penalty doesn't apply. Does anyone know if a >>> penalty like this exists in any of Mesa's other back-ends? If so that might >>> suggest some good experiments to try. I'm open to revising my opinion if >>> someone measures a significant performance degradation, particularly with a >>> real-world app. >> >> R300 and R400 support 4 texture indirections (as defined by >> ARB_fragment_program). Adding ALU instructions before the first TEX >> instruction increases the number of texture indirections by 1, which >> might make some shaders not be executable on the hardware at all. >> >> I think this optimization should be disabled on drivers where the >> texture indirection limit is too low. > > And are swizzles of texcoords required to be separate MOVs beforehand > (like on i915)? </div></div>Yes, swizzles aren't supported by the TEX instruction and must be lowered. And the lowering sucks, because the only supported 3D source operand swizzles are .xxx, .yyy, .zzz, .www, .yzw, .zxy, .wzy, .111, .000, and 0.HHH (H=0.5), so the swizzle can occupy up to 3 MOV instructions. The 4th channel is handled by a separate scalar instruction, which is independent of the 3D instruction. (R300 can execute one 3D and one scalar instruction simultaneously) Marek </blockquote></div> </div><div class="gmail_extra">Ok, unless I hear objections, I'll rework the patch series so that the driver can opt out of varying packing (e.g. by setting Const.DisableVaryingPacking or some such). I'll add an assertion to verify that drivers that opt out of varying packing don't support transform feedback (so that we don't have to go to extra work to support transform feedback of both packed and unpacked varyings). I don't expect the re-work to change too many things, so feel free to review the patch series as-is and I'll fold your review into v2 when I get to it (probably in the next day or two). </div>