<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 27, 2016 at 7:05 PM, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This fixes the few code quality regressions from the previous series<br> enabling SIMD32 CS codegen in the back-end -- AFAICT by the end of the<br> series we can finally enable GL 4.3 on all Gen8+ hardware.<br> <br> Patches 1-8 delay the SIMD lowering pass after the bulk of<br> optimization passes have been run, which should decrease the<br> compilation time of mainly SIMD32 shaders and improve the code quality<br> of SIMD32 shaders on all generations and shaders of any dispatch width<br> on older generations (up to and including IVB) that use SIMD lowering<br> more intensively to implement various workarounds.<br> <br> Patches 9-14 rework the SIMD lowering pass to avoid emitting the copy<br> instructions used to zip and unzip register regions where possible,<br> since the register coalesce and copy propagation passes seem to<br> perform rather poorly at getting rid of them in some cases. In the<br> long term we'll likely want to improve the register coalesce pass<br> irrespective of these changes.<br> <br> Patches 15-20 improve the compute-to-mrf pass used on Gen4-6 to handle<br> cases where the source of a VGRF-to-MRF copy is initialized by the<br> shader using multiple single-GRF writes, which becomes far more common<br> with the additional SIMD lowering going on after this series.<br> <br> Patches 21-24 are some other assorted changes improving code quality<br> on older gens.<br> <br> I wanted to provide more detailed (e.g. per commit) shader-db stats<br> with this series, but kind of ran out of time. Let me know if you<br> would like to see more evidence that any of the changes below is<br> improving code quality in case it's not clear from the commit alone.<br> <br> [PATCH 01/25] i965/fs: Let CSE handle logical sampler sends as expressions.<br> [PATCH 02/25] i965/fs: Allow constant propagation into logical send sources.<br> [PATCH 03/25] i965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to has_side_effects().<br> [PATCH 04/25] i965/fs: Run SIMD and logical send lowering after the optimization loop.<br> [PATCH 05/25] i965/fs: Take opt_redundant_discard_jumps out of the optimization loop.<br> [PATCH 06/25] i965/fs: Fix UB list sentinel dereference in opt_sampler_eot().<br> [PATCH 07/25] i965/fs: Implement opt_sampler_eot() in terms of logical sends. </blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> [PATCH 08/25] SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot().<br></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> [PATCH 09/25] i965/fs: Refactor offset() into a separate function taking the width as argument.<br> [PATCH 10/25] i965/fs: Generalize regions_overlap() from copy propagation to handle non-VGRF files.<br> [PATCH 11/25] i965/fs: Factor out region zipping and unzipping from the SIMD lowering pass.<br> [PATCH 12/25] i965/fs: Skip SIMD lowering source unzipping for regular scalar regions.<br> [PATCH 13/25] i965/fs: Skip SIMD lowering destination zipping if possible.<br> [PATCH 14/25] i965/fs: Reindent emit_zip().<br></blockquote><div><br></div><div>9-14 Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> [PATCH 15/25] i965/fs: Teach regions_overlap() about COMPR4 MRF regions.<br> [PATCH 16/25] i965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap().<br> [PATCH 17/25] i965/fs: Fix compute-to-mrf VGRF region coverage condition.<br> [PATCH 18/25] i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops.<br> [PATCH 19/25] i965/fs: Teach compute_to_mrf about the COMPR4 address transformation.<br> [PATCH 20/25] i965/fs: Extend compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF writes.<br> [PATCH 21/25] i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies.<br></blockquote><div><br></div><div>15-21 scare me. A lot. They even make me think that forking the compiler between SNB and IVB may be a good idea. :-/ MRFs are annoying, but COMPR4 is such a gross hack that I really want to teach as little of the compiler about it as possible.<br><br></div><div>So here's the million dollar question: Do we need them? and, more importantly, do we need them now? I didn't see anything wrong in my brief skimming but don't call that a review. <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> [PATCH 22/25] i965/fs: Fix constant combining for instructions that cannot accept source mods.<br> [PATCH 23/25] i965/fs: Allow scalar source regions on SNB math instructions.<br> [PATCH 24/25] i965/fs: Skip gen4 pre/post-send dependency workaronds for the first/last block.<br> [PATCH 25/25] i965: Expose GL 4.3 on Gen8+.<br></blockquote><div><br></div><div>22-25 are Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> _______________________________________________<br> mesa-dev mailing list<br> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br> </blockquote></div><br></div></div>