[Bug 89597] Implement SSBOs in GLSL front-end and i965

Mon May 4 04:05:33 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #38 from Francisco Jerez <currojerez at riseup.net> ---
Hi Iago,

(In reply to Iago Toral from comment #37)
> (In reply to Iago Toral from comment #36)
> > (In reply to Kenneth Graunke from comment #35)
> > > Is this on Gen7?  Does setting GEN7_PS_VECTOR_MASK_ENABLE in 3DSTATE_PS
> > > DWord 2 help?  (Gen8+ already does this.)
> >
> > Yes, it is gen7. I've just tested this but it does not seem to have any
> > effect in this case, the problem persists.
>
> As far as I can see, this issue spawns from the fact that atomic messages
> can include a message header with pixel mask information while scattered
> read/write messages (like many other read/write messages) can't, which can
> lead to some inconsistencies when both are used together, something that I
> imagine being common in SSBO usage patterns, unfortunately.
>
> The PRMs clearly state that the pixel mask and the dispatch mask can be
> different, which means that in these scenarios atomic operations will
> operate on less channels (since the pixel mask is implicitly anded with the
> execution mask) than our read/write messages, leading the inconsistent
> behavior I explained in comment #34.

The sample and execution masks differ in cases where a subset of
fragments in a single subspan (2x2 fragment block) are either not
covered by the primitive being drawn or discarded by the early
depth or stencil tests.  This is required for derivatives to give
well-defined results if they are calculated explicitly or
implicitly by texture sampling operations.

>
> One way to work around this issue would be to fix the pixel mask in atomic
> operations to always be 0xffff. Since the mask is anded with the dispatch
> mask this should make atomic operations effectively use the dispatch mask
> and operate on the same channels as our read/write messages. I have tested
> this locally with a couple of examples and this seems to work, producing
> consistent results.
>

That wouldn't be compliant, see section 7.1 of the GLSL spec
version 4.5:

| Fragment shader helper invocations execute the same shader code
| as non-helper invocations, but will not have side effects that
| modify the framebuffer or other shader-accessible memory. In
| particular:
| [..]
|  - Stores to image and buffer variables performed by helper
|    invocations have no effect on the underlying image or buffer
|    memory.
|  - Atomic operations to image, buffer, or atomic counter
|    variables performed by helper invocations have no effect on
|    the underlying image or buffer memory. The values returned by
|    such atomic operations are undefined.
|

> Would this be an option?
>
> If that is not a good idea, then I guess the alternative would be to do it
> the other way around: fix the dispatch mask in the read/write messages to be
> like the pixel mask we use in atomic operations, but I don't know if that is
> possible.

AFAIK scattered DWORD read and write messages don't to take a
sample mask independent from the execution mask.  It seems to me
that they aren't particularly well-suited for this extension.
Is there any reason you aren't using untyped surface reads and
writes instead?  That would allow you to provide an explicit
sample mask, share some code with ARB_shader_image_load_store by
using the same instructions (I'll land my patches adding support
for untyped surface writes shortly), and access up to 128 bits of
data per channel and message, likely giving better performance in
the long run.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150504/6940813a/attachment-0001.html>