[Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass
jason at jlekstrand.net
Thu Jul 5 21:28:14 UTC 2018
On Thu, Jul 5, 2018 at 2:18 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Thu, Jul 5, 2018 at 11:03 AM, Francisco Jerez <currojerez at riseup.net>
>> Jason Ekstrand <jason at jlekstrand.net> writes:
>> > On Wed, Jul 4, 2018 at 1:20 PM, Francisco Jerez <currojerez at riseup.net>
>> > wrote:
>> >> Jason Ekstrand <jason at jlekstrand.net> writes:
>> >> > Many fragment shaders do a discard using relatively little
>> >> > but still put the discard fairly far down in the shader for no good
>> >> > reason. If the discard is moved higher up, we can possibly avoid
>> >> > some or almost all of the work in the shader. When this lets us skip
>> >> > texturing operations, it's an especially high win.
>> >> >
>> >> > One of the biggest offenders here is DXVK. The D3D APIs have
>> >> > rules for discards than OpenGL and Vulkan. One effective way (which
>> >> > what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
>> >> > wait until the very end of the shader to discard. This ends up in
>> >> > pessimal case where we always do all of the work before discarding.
>> >> > This pass helps some DXVK shaders significantly.
>> >> >
>> >> One thing to keep in mind is that this sort of transformation is
>> >> off run-time of fragment shader invocations that don't call discard (or
>> >> do so non-uniformly, which means that the code the discard jump is
>> >> protecting will be executed anyway, so doing this can actually increase
>> >> the critical path of the program) in favour of invocations that call
>> >> discard uniformly (so executing discard early will effectively
>> >> the program early).
>> > It's not really a uniform vs. non-uniform thing. Even if a shader only
>> > discards some of the fragments, it sill reduces the number of live
>> > which reduces the cost of later non-uniform control-flow.
>> Which only helps if the shader's control flow is sufficiently
>> non-uniform that the additional cost from performing those computations
>> early pays off -- Or not at all if the discarded fragments need to be
>> executed (non-compliantly) anyway in order to provide
>> derivatives_safe_after_discard. However, if the discard condition is
>> uniform (across a warp), the thread can be terminated early by the
>> back-end most certainly, which gives you the maximum pay-off. Uniform
>> discard conditions are therefore the best-case scenario for this
>> optimization pass.
> Yes, that is correct. Fortunately, things that discard tend to discard
> fairly large chunks of the polygon at one time so this case is fairly
>> >> Optimizing for the latter case is an essentially
>> >> heuristic assumption that needs to be verified experimentally. Have
>> >> tested the effect of this pass on non-DX workloads extensively?
>> > Yes, it is a trade-off. No, I have not done particularly extensive
>> > testing. We do, however, know of non-DXVK workloads that would benefit
>> > from this. I believe Manhattan is one such example though I have not
>> > benchmarked it.
>> You should grab some numbers then to make sure there are no
> I'm working on that. Unfortunately the perf system is giving me trouble
> so I don't have the numbers yet.
>> But keep in mind that the i965 scheduler is already
>> performing a similar optimization (locally, but with cycle-count
>> information). This will only help over the existing optimization if the
>> shaders that represent a bottleneck in Manhattan have sufficient control
>> flow for the basic block boundaries to represent a problem to the
>> (local) scheduler.
> I'm not sure about the manhattan shader but the Skyrim shader does have
> control flow which the discard has to get moved above.
I have results from the perf system now and somehow this pass makes
manhattan noticeably worse. I'll look into that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mesa-dev