[Mesa-dev] [PATCH v2 23/82] glsl: Do not do CSE for expressions involving SSBO loads
Francisco Jerez
currojerez at riseup.net
Wed Jun 24 05:43:12 PDT 2015
Iago Toral <itoral at igalia.com> writes:
> On Wed, 2015-06-17 at 17:20 -0700, Jordan Justen wrote:
>> I wanted to question whether this was required, based on this text
>> from the extension spec:
>>
>> "The ability to write to buffer objects creates the potential for
>> multiple independent shader invocations to read and write the same
>> underlying memory. The same issue exists with the
>> ARB_shader_image_load_store extension provided in OpenGL 4.2, which
>> can write to texture objects and buffers. In both cases, the
>> specification makes few guarantees related to the relative order of
>> memory reads and writes performed by the shader invocations."
>>
>> But I'm not sure if we can reconcile CSE with 'memoryBarrier' and
>> 'barrier'. curro, any thoughts from image load/store?
>
> I think the problem is within the same thread, that text above talks
> about multiple invocations reading from and writing to the same
> location, but within the same invocation, the order of reads and writes
> must be preserved:
>
> "Buffer variable memory reads and writes within a single shader
> invocation are processed in order. However, the order of reads and
> writes performed in one invocation relative to those performed by
> another invocation is largely undefined."
>
> For example, if X is a shader storage buffer variable and we have code
> like this with just one invocation:
>
> ssbo_store(X, 1);
> a = ssbo_load(X) + 1 // a = 2
> ssbo_store(X, 2);
> b = ssbo_load(X) + 1; // b = 3
>
> CSE could mess it up like this:
>
> ssbo_store(X, 1);
> tmp = ssbo_load(X) + 1 // tmp = 2
> a = tmp;
> ssbo_store(X, 2);
> b = tmp;
>
> which would be incorrect. I think I wrote this patch after seeing
> something like this happening. The CSE pass clearly states that it does
> not support write variables after all.
>
> Also, notice the same would apply if there are multiple invocations but
> the shader code used something like gl_VertexID or gl_FragCoord to make
> each invocation read from/write to a different address within the SSBO
> buffer (I imagine this is the usual way to operate with SSBOs). In these
> cases, even if we have multiple invocations, keeping the relative order
> of reads and writes within each one is necessary.
>
AFAICT the reason why this (and many of the other changes in GLSL
optimization passes) is needed is because SSBO loads have been
implemented as ir_expression nodes instead of being lowered into
intrinsics (as other side-effectful operations do like
ARB_shader_image_load_store and ARB_shader_atomic_counters). This
surely broke the assumption of a number of optimization passes that
ir_expression nodes behave as pure functions. I guess the reason why
you've done it this way is because UBO loads were already being
represented as expressions, so I see why you may have wanted to use the
same approach for SSBOs even though there is a fundamental difference
between the two: UBO loads have no side effects and are constant for a
given set of arguments and a given shader execution, SSBO loads and
stores are not. SSBO stores couldn't be accommodated into the same
framework so easily, and you decided to create a separate ir node for
them, what seems inconsistent with loads. Intrinsics would probably
have been a good fit for both loads and stores, and would have made all
these optimization changes unnecessary...
P.S.: Sorry for the late reply, I was on vacation when I was CC'ed.
> Iago
>
>> -Jordan
>>
>> On 2015-06-03 00:01:13, Iago Toral Quiroga wrote:
>> > SSBOs are read/write and this CSE pass only handles read-only variables.
>> > ---
>> > src/glsl/opt_cse.cpp | 33 ++++++++++++++++++++++++++++++++-
>> > 1 file changed, 32 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/src/glsl/opt_cse.cpp b/src/glsl/opt_cse.cpp
>> > index 4b8e9a0..a05ab46 100644
>> > --- a/src/glsl/opt_cse.cpp
>> > +++ b/src/glsl/opt_cse.cpp
>> > @@ -245,6 +245,28 @@ contains_rvalue(ir_rvalue *haystack, ir_rvalue *needle)
>> > }
>> >
>> > static bool
>> > +expression_contains_ssbo_load(ir_expression *expr)
>> > +{
>> > + if (expr->operation == ir_binop_ssbo_load)
>> > + return true;
>> > +
>> > + for (unsigned i = 0; i < expr->get_num_operands(); i++) {
>> > + ir_rvalue *op = expr->operands[i];
>> > + if (op->ir_type == ir_type_expression &&
>> > + expression_contains_ssbo_load(op->as_expression())) {
>> > + return true;
>> > + } else if (op->ir_type == ir_type_swizzle) {
>> > + ir_swizzle *swizzle = op->as_swizzle();
>> > + ir_expression *val = swizzle->val->as_expression();
>> > + if (val && expression_contains_ssbo_load(val))
>> > + return true;
>> > + }
>> > + }
>> > +
>> > + return false;
>> > +}
>> > +
>> > +static bool
>> > is_cse_candidate(ir_rvalue *ir)
>> > {
>> > /* Our temporary variable assignment generation isn't ready to handle
>> > @@ -260,7 +282,16 @@ is_cse_candidate(ir_rvalue *ir)
>> > * to variable-index array dereferences at some point.
>> > */
>> > switch (ir->ir_type) {
>> > - case ir_type_expression:
>> > + case ir_type_expression: {
>> > + /* Skip expressions involving SSBO loads, since these operate on
>> > + * read-write variables, meaning that the same ssbo_load expression
>> > + * may return a different value if the underlying buffer storage
>> > + * is written in between.
>> > + */
>> > + if (expression_contains_ssbo_load(ir->as_expression()))
>> > + return false;
>> > + }
>> > + break;
>> > case ir_type_texture:
>> > break;
>> > default:
>> > --
>> > 1.9.1
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150624/4e553bb8/attachment-0001.sig>
More information about the mesa-dev
mailing list