[Bug 89597] Implement SSBOs in GLSL front-end and i965

Fri May 8 04:40:29 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #54 from Francisco Jerez <currojerez at riseup.net> ---
(In reply to Iago Toral from comment #53)
> (In reply to Iago Toral from comment #52)
> > (In reply to Francisco Jerez from comment #51)
> > > (In reply to Iago Toral from comment #50)
> > > > Francisco, your implementation of brw_untyped_surface_write in brw_eu_emit.c
> > > > fixes the mask to use for the dst, which in turn decides which channels are
> > > > effectively written to memory.
> > > > 
> > > > For vec4 and gen7 (except haswell) it sets the writemask to WRITEMASK_X, but
> > > > I think we want clients of this function to decide the writemask to use. For
> > > > example, if we are writing a vec4, I want to use WRITEMASK_XYZW (even if we
> > > > are using SIMD8 mode) to get all 4 components written directly. Fixing the
> > > > writemask inside this function seems a bit restrictive.
> > > > 
> > > 
> > > IVB didn't have a SIMD4x2 variant of the untyped surface write
> > > message, so the SIMD8 one has to be used.  The writemask is
> > > required because the dataport will reinterpret the execution mask
> > > sent by the EU as if each bit mapped to a separate scalar
> > > channel, just like is the case for a FS thread, so, if you set
> > > the writemask to XYZW the dataport might end up writing *eight*
> > > separate vec4s to memory, which is almost certainly not what you
> > > want.
> > 
> > I think this should not happen. The IVB PRMs, 3.9.9.10
> > Message Payload, says:
> > 
> > "For SIMD16 and SIMD8 messages, the message length is used to determine how
> > may address parameters are included in the message. The number of message
> > registers in the write data payload is determined by the number of channel
> > mask bits that are enabled"
> > 
> > So, if we only enable one channel (red), it should only write up to 8
> > dwords, never 8 vec4s. With this in mind, this is what I am doing:
> > 
> > I write up to 8 offsets to M1 and up to 8 values to M2 (so red channel
> > only). The first 4 values in the red channel (M2.0 to M2.3) are the four
> > vector components of the vertex stored in the lower half of the SIMD4x2
> > execution, the data from second vertex of the SIMD4x2 execution goes in M2.4
> > to M2.7. Since I only provide data for the red channel, the message can only
> > write up to 8 dwords, no matter the writemask I use. Then, with WRITEMASK_X,
> > only M2.0 and M.2.4 get written. With WRITEMASK_XYZW I get all 8 dwords
> > written, with other writetemasks I can get any subset I need, which works
> > great because I only need to pass the writemask we get from the GLSL IR as
> > is to get exactly what we want.
> > 
> > I have tested this in multiple scenarios and seems to work fine in all of
> > them, and the implementation is straight forward. Do you see any issues that
> > I might be missing?
> 
> There is another benefit of this implementation that I have just noticed:
> the structure of the payload I use in SIMD8 mode for IVB is the same as the
> structure of the payload for haswell in SIMD4x2, which means that the
> implementation is the same for both (if anything I can optimize the haswell
> version since it only needs M1.0 and M1.4 in the address payload). If I
> change the implementation to do as you suggested then I would need to write
> different implementations for both systems, right?

No, not necessarily.  In fact in the link I shared earlier there
is a single implementation of untyped surface write shared among
all generations which works regardless of whether SIMD8, 16 or
4x2 is being used.  The only reason why that's possible despite
the strange vector layout of the message on IVB is that both the
X-only SIMD8 message and the HSW SIMD4x2 messages have the exact
same semantics -- IOW they take and return the same set of values
up to a transposition, which can be applied consistently to all
vector values based on the has_simd4x2 flag, in a manner
transparent for the implementation of the typed and untyped
surface messages.

Even though your idea would work in this specific case it can't
be extended to the typed messages (because by tweaking the
coordinates you get the same single color component from
different locations of the image rather than different color
components), and it breaks the symmetry (up to a transpose)
between the FS and VEC4 implementations and between the HSW+ and
IVB implementations.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150508/e76a02b3/attachment.html>