[Mesa-dev] [PATCH 1/2] radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs

Marek Olšák maraeo at gmail.com
Thu Sep 7 14:56:24 UTC 2017


>>   +static bool si_out_of_order_rasterization(struct si_context *sctx)
>> +{
>> +       struct si_state_blend *blend = sctx->queued.named.blend;
>> +       struct si_state_dsa *dsa = sctx->queued.named.dsa;
>> +
>> +       if (!sctx->screen->has_out_of_order_rast)
>> +               return false;
>> +
>> +       /* PS with memory stores can't run out-of-order. */
>> +       if (sctx->ps_shader.cso &&
>> +           sctx->ps_shader.cso->info.writes_memory)
>> +               return false;
>
>
> I'm actually not sure this is necessary. The spec is quite relaxed about the
> order of pixel shader invocations and whether they happen at all.

If DEPTH_BEFORE_SHADER is 1, depth tests have to be done in-order. If
they are done out-of-order, pixel shader invocations that would
normally be rejected can be executed needlessly. Does the spec allow
that?

>
>
>
>> +
>> +       unsigned colormask = sctx->framebuffer.colorbuf_enabled_4bit &
>> +                            blend->cb_target_enabled_4bit;
>> +
>> +       /* No logic op. */
>> +       if (colormask && blend->logicop_enable)
>> +               return false;
>> +
>> +       struct r600_texture *zstex =
>> +               (struct
>> r600_texture*)sctx->framebuffer.state.zsbuf->texture;
>> +       bool has_stencil = sctx->framebuffer.state.zsbuf &&
>> +                          zstex->surface.flags & RADEON_SURF_SBUFFER;
>> +       bool blend_enabled = (colormask & blend->blend_enable_4bit) != 0;
>> +
>> +       /* Out-of-order rasterization can be enabled for these cases:
>> +        *
>> +        * - color-only rendering:
>> +        *   + blending must be enabled and commutative
>> +        *   + only when inexact behavior due to rounding is allowed
>> +        *
>> +        * - depth-only rendering:
>> +        *   + depth must force ordering
>> +        *
>> +        * - stencil-only rendering:
>> +        *   + never --- can we do better here?
>> +        *
>> +        * - color rendering with read-only depth:
>> +        *   + blending must be disabled
>> +        *   + depth must force ordering
>> +        *
>> +        * - color rendering with read-only stencil:
>> +        *   + blending must be disabled
>> +        *
>> +        * - color+depth rendering:
>> +        *   + blending must be disabled
>> +        *   + depth must force ordering
>> +        *   + only when Z-fighting is allowed to result in inexact
>> behavior
>> +        *
>> +        * - color+stencil rendering:
>> +        *   + never --- can we do better here?
>> +        *
>> +        * - color+depth+stencil rendering:
>> +        *   + never --- can we do better here?
>> +        */
>
>
> I can't quite wrap my head around the logic here.
>
> Here's a suggestion for cleaning it up conceptually:
>
> - Record in DSA whether DSA *by itself* can run out-of-order or not, meaning
> that the final result in Z/S is unaffected by out-of-order
> -- This is trivially the case when there are no Z/S writes
> -- It is also the case when stencil writes are disabled and Zfunc is NEVER
> or one of the ordered ones ("depth_forces_ordering", currently)
> -- It is also the case when depth writes are disabled, Sfunc is ALWAYS, and
> zpass_op/zfail_op are KEEP, ZERO, REPLACE, INVERT, INCR_WRAP, DECR_WRAP, or
> Sfunc is NEVER and the same applies to fail_op [I think this allows
> out-of-order to be enabled for stencil shadow passes]
>
> - Record in DSA whether the set of fragments passing DSA is unaffected by
> out-of-order
> -- This is trivially the case when there are no Z/S writes
> -- It is the case when stencil writes are disabled and Zfunc is ALWAYS or
> NEVER
> -- It is the case when depth writes are disabled and Sfunc is ALWAYS or
> NEVER
>
> - Record in DSA whether the *last* fragment passing DSA for each sample is
> unaffected by out-of-order
> -- This is *never* the case if we're being honest, but we can enable it in
> an optional "aggressive" mode when stencil writes are disabled, Z writes are
> enabled and Z func is one of the ordered functions
>
> The overall out-of-order enable is then:
>
> - if DSA by itself cannot run out-of-order, disable
> - if color writes are disabled, enable
> - if logic op is enabled, disable
> - if blending is enabled:
>  o disable if non-commutative
>  o enable if commutative and the set of fragments passing DSA is unaffected
> by out-of-rder
> - if blending is disabled, enable iff the *last* fragment passing DSA is
> unaffected
>
> Thinking it through in this way, I believe I discovered at least one bug in
> the patch as-is, in the following configuration
>
> - blending disabled and no stencil
> - depth is enabled, depth writes are disabled, and Zfunc == LESS
>
> In this case, dsa->depth_forces_ordering_color_on will be true and
> out-of-order will be enabled. But that's not correct, because there may be
> multiple triangles with Z-values less than whatever's in the depth buffer.
>
> On second thought, the whole "record in DSA" thing gets a bit more
> complicated because it interacts with whether Z/S buffers are actually
> present. The no-Z/S case is easy (the first two bits are "Yes", the last one
> is "No"), but we need to distinguish whether stencil is present or not.
> Maybe both of these can be pre-calculated and stored in DSA.

You seem to be much better at it than I am. If you wanna take a stab
at it, here are both patches in the reversed order:
https://cgit.freedesktop.org/~mareko/mesa/log/?h=out-of-order-rast

Marek


More information about the mesa-dev mailing list