[Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

Thu Jul 5 21:28:14 UTC 2018

On Thu, Jul 5, 2018 at 2:18 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:

> On Thu, Jul 5, 2018 at 11:03 AM, Francisco Jerez <currojerez at riseup.net>
> wrote:
>
>> Jason Ekstrand <jason at jlekstrand.net> writes:
>>
>> > On Wed, Jul 4, 2018 at 1:20 PM, Francisco Jerez <currojerez at riseup.net>
>> > wrote:
>> >
>> >> Jason Ekstrand <jason at jlekstrand.net> writes:
>> >>
>> >> > Many fragment shaders do a discard using relatively little
>> information
>> >> > but still put the discard fairly far down in the shader for no good
>> >> > reason.  If the discard is moved higher up, we can possibly avoid
>> doing
>> >> > some or almost all of the work in the shader.  When this lets us skip
>> >> > texturing operations, it's an especially high win.
>> >> >
>> >> > One of the biggest offenders here is DXVK.  The D3D APIs have
>> different
>> >> > rules for discards than OpenGL and Vulkan.  One effective way (which
>> is
>> >> > what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
>> >> > wait until the very end of the shader to discard.  This ends up in
>> the
>> >> > pessimal case where we always do all of the work before discarding.
>> >> > This pass helps some DXVK shaders significantly.
>> >> >
>> >>
>> >> One thing to keep in mind is that this sort of transformation is
>> trading
>> >> off run-time of fragment shader invocations that don't call discard (or
>> >> do so non-uniformly, which means that the code the discard jump is
>> >> protecting will be executed anyway, so doing this can actually increase
>> >> the critical path of the program) in favour of invocations that call
>> >> discard uniformly (so executing discard early will effectively
>> terminate
>> >> the program early).
>> >
>> >
>> > It's not really a uniform vs. non-uniform thing.  Even if a shader only
>> > discards some of the fragments, it sill reduces the number of live
>> channels
>> > which reduces the cost of later non-uniform control-flow.
>> >
>>
>> Which only helps if the shader's control flow is sufficiently
>> non-uniform that the additional cost from performing those computations
>> early pays off -- Or not at all if the discarded fragments need to be
>> executed (non-compliantly) anyway in order to provide
>> derivatives_safe_after_discard.  However, if the discard condition is
>> uniform (across a warp), the thread can be terminated early by the
>> back-end most certainly, which gives you the maximum pay-off.  Uniform
>> discard conditions are therefore the best-case scenario for this
>> optimization pass.
>>
>
> Yes, that is correct.  Fortunately, things that discard tend to discard
> fairly large chunks of the polygon at one time so this case is fairly
> common.
>
>
>> >
>> >> Optimizing for the latter case is an essentially
>> >> heuristic assumption that needs to be verified experimentally.  Have
>> you
>> >> tested the effect of this pass on non-DX workloads extensively?
>> >>
>> >
>> > Yes, it is a trade-off.  No, I have not done particularly extensive
>> > testing.  We do, however, know of non-DXVK workloads that would benefit
>> > from this.  I believe Manhattan is one such example though I have not
>> yet
>> > benchmarked it.
>> >
>>
>> You should grab some numbers then to make sure there are no
>> regressions...
>
>
> I'm working on that.  Unfortunately the perf system is giving me trouble
> so I don't have the numbers yet.
>
>
>> But keep in mind that the i965 scheduler is already
>> performing a similar optimization (locally, but with cycle-count
>> information).  This will only help over the existing optimization if the
>> shaders that represent a bottleneck in Manhattan have sufficient control
>> flow for the basic block boundaries to represent a problem to the
>> (local) scheduler.
>>
>
> I'm not sure about the manhattan shader but the Skyrim shader does have
> control flow which the discard has to get moved above.
>

I have results from the perf system now and somehow this pass makes
manhattan noticeably worse.  I'll look into that.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180705/0a5263c1/attachment-0001.html>