[Bug 89597] Implement SSBOs in GLSL front-end and i965

Mon May 4 04:44:06 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #39 from Iago Toral <itoral at igalia.com> ---
Hi Francisco, thanks a lot for the input,

(In reply to Francisco Jerez from comment #38)
> Hi Iago,
> 
> (In reply to Iago Toral from comment #37)
> > (In reply to Iago Toral from comment #36)
> > > (In reply to Kenneth Graunke from comment #35)
> > > > Is this on Gen7?  Does setting GEN7_PS_VECTOR_MASK_ENABLE in 3DSTATE_PS
> > > > DWord 2 help?  (Gen8+ already does this.)
> > >
> > > Yes, it is gen7. I've just tested this but it does not seem to have any
> > > effect in this case, the problem persists.
> >
> > As far as I can see, this issue spawns from the fact that atomic messages
> > can include a message header with pixel mask information while scattered
> > read/write messages (like many other read/write messages) can't, which can
> > lead to some inconsistencies when both are used together, something that I
> > imagine being common in SSBO usage patterns, unfortunately.
> >
> > The PRMs clearly state that the pixel mask and the dispatch mask can be
> > different, which means that in these scenarios atomic operations will
> > operate on less channels (since the pixel mask is implicitly anded with the
> > execution mask) than our read/write messages, leading the inconsistent
> > behavior I explained in comment #34.
> 
> The sample and execution masks differ in cases where a subset of
> fragments in a single subspan (2x2 fragment block) are either not
> covered by the primitive being drawn or discarded by the early
> depth or stencil tests.  This is required for derivatives to give
> well-defined results if they are calculated explicitly or
> implicitly by texture sampling operations.

I see, thanks for explaining this.

> >
> > One way to work around this issue would be to fix the pixel mask in atomic
> > operations to always be 0xffff. Since the mask is anded with the dispatch
> > mask this should make atomic operations effectively use the dispatch mask
> > and operate on the same channels as our read/write messages. I have tested
> > this locally with a couple of examples and this seems to work, producing
> > consistent results.
> >
> 
> That wouldn't be compliant, see section 7.1 of the GLSL spec
> version 4.5:
> 
> | Fragment shader helper invocations execute the same shader code
> | as non-helper invocations, but will not have side effects that
> | modify the framebuffer or other shader-accessible memory. In
> | particular:
> | [..]
> |  - Stores to image and buffer variables performed by helper
> |    invocations have no effect on the underlying image or buffer
> |    memory.
> |  - Atomic operations to image, buffer, or atomic counter
> |    variables performed by helper invocations have no effect on
> |    the underlying image or buffer memory. The values returned by
> |    such atomic operations are undefined.
> |

Right, good catch.

> > Would this be an option?
> >
> > If that is not a good idea, then I guess the alternative would be to do it
> > the other way around: fix the dispatch mask in the read/write messages to be
> > like the pixel mask we use in atomic operations, but I don't know if that is
> > possible.
> 
> AFAIK scattered DWORD read and write messages don't to take a
> sample mask independent from the execution mask.  It seems to me
> that they aren't particularly well-suited for this extension.
> Is there any reason you aren't using untyped surface reads and
> writes instead?  That would allow you to provide an explicit
> sample mask, share some code with ARB_shader_image_load_store by
> using the same instructions (I'll land my patches adding support
> for untyped surface writes shortly), and access up to 128 bits of
> data per channel and message, likely giving better performance in
> the long run.

Not really, I started this using oword writes, but that had some issues with
unaligned offsets so Jason suggested using scattered writes instead. At that
moment it seemed like that was a good fit for fragment shaders: it would allow
us to write up to 8/16 dwords at random dword offsets and that seemed to work
great in all scenarios... until I played with some more elaborate tests that
combined atomics and writes in the fragment shader and noticed this problem
with the pixel mask included with atomic messages.

So unless someone else has a clever idea to respect the pixel mask with
scattered write messages I guess I should at least look into untyped write
messages. I imagine that reads are fine with scattered messages since channels
that are not in the pixel mask will be ignored at write time.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150504/44d8305f/attachment.html>