[Bug 89597] Implement SSBOs in GLSL front-end and i965

Sat May 9 09:50:23 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #56 from Francisco Jerez <currojerez at riseup.net> ---
(In reply to Iago Toral from comment #55)
> (In reply to Francisco Jerez from comment #54)
> > (In reply to Iago Toral from comment #53)
> > > (In reply to Iago Toral from comment #52)
> > > > (In reply to Francisco Jerez from comment #51)
> > > > > (In reply to Iago Toral from comment #50)
> > > > > > Francisco, your implementation of brw_untyped_surface_write in brw_eu_emit.c
> > > > > > fixes the mask to use for the dst, which in turn decides which channels are
> > > > > > effectively written to memory.
> > > > > > 
> > > > > > For vec4 and gen7 (except haswell) it sets the writemask to WRITEMASK_X, but
> > > > > > I think we want clients of this function to decide the writemask to use. For
> > > > > > example, if we are writing a vec4, I want to use WRITEMASK_XYZW (even if we
> > > > > > are using SIMD8 mode) to get all 4 components written directly. Fixing the
> > > > > > writemask inside this function seems a bit restrictive.
> > > > > > 
> > > > > 
> > > > > IVB didn't have a SIMD4x2 variant of the untyped surface write
> > > > > message, so the SIMD8 one has to be used.  The writemask is
> > > > > required because the dataport will reinterpret the execution mask
> > > > > sent by the EU as if each bit mapped to a separate scalar
> > > > > channel, just like is the case for a FS thread, so, if you set
> > > > > the writemask to XYZW the dataport might end up writing *eight*
> > > > > separate vec4s to memory, which is almost certainly not what you
> > > > > want.
> > > > 
> > > > I think this should not happen. The IVB PRMs, 3.9.9.10
> > > > Message Payload, says:
> > > > 
> > > > "For SIMD16 and SIMD8 messages, the message length is used to determine how
> > > > may address parameters are included in the message. The number of message
> > > > registers in the write data payload is determined by the number of channel
> > > > mask bits that are enabled"
> > > > 
> > > > So, if we only enable one channel (red), it should only write up to 8
> > > > dwords, never 8 vec4s. With this in mind, this is what I am doing:
> > > > 
> > > > I write up to 8 offsets to M1 and up to 8 values to M2 (so red channel
> > > > only). The first 4 values in the red channel (M2.0 to M2.3) are the four
> > > > vector components of the vertex stored in the lower half of the SIMD4x2
> > > > execution, the data from second vertex of the SIMD4x2 execution goes in M2.4
> > > > to M2.7. Since I only provide data for the red channel, the message can only
> > > > write up to 8 dwords, no matter the writemask I use. Then, with WRITEMASK_X,
> > > > only M2.0 and M.2.4 get written. With WRITEMASK_XYZW I get all 8 dwords
> > > > written, with other writetemasks I can get any subset I need, which works
> > > > great because I only need to pass the writemask we get from the GLSL IR as
> > > > is to get exactly what we want.
> > > > 
> > > > I have tested this in multiple scenarios and seems to work fine in all of
> > > > them, and the implementation is straight forward. Do you see any issues that
> > > > I might be missing?
> > > 
> > > There is another benefit of this implementation that I have just noticed:
> > > the structure of the payload I use in SIMD8 mode for IVB is the same as the
> > > structure of the payload for haswell in SIMD4x2, which means that the
> > > implementation is the same for both (if anything I can optimize the haswell
> > > version since it only needs M1.0 and M1.4 in the address payload). If I
> > > change the implementation to do as you suggested then I would need to write
> > > different implementations for both systems, right?
> > 
> > No, not necessarily.  In fact in the link I shared earlier there
> > is a single implementation of untyped surface write shared among
> > all generations which works regardless of whether SIMD8, 16 or
> > 4x2 is being used.  The only reason why that's possible despite
> > the strange vector layout of the message on IVB is that both the
> > X-only SIMD8 message and the HSW SIMD4x2 messages have the exact
> > same semantics -- IOW they take and return the same set of values
> > up to a transposition, which can be applied consistently to all
> > vector values based on the has_simd4x2 flag, in a manner
> > transparent for the implementation of the typed and untyped
> > surface messages.
> > 
> > Even though your idea would work in this specific case it can't
> > be extended to the typed messages (because by tweaking the
> > coordinates you get the same single color component from
> > different locations of the image rather than different color
> > components), and it breaks the symmetry (up to a transpose)
> > between the FS and VEC4 implementations and between the HSW+ and
> > IVB implementations.
> 
> I see, thanks again for taking the time to explain this. I'll rewrite my
> solution to use SIMD8 messages with WRITEMASK_X only so we can keep that
> symmetry.

I was thinking that it wouldn't make sense to re-implement this
madness in the SSBO handling code.  ARB_shader_image_load_store
uses the exact same messages so we should share a common
implementation.  Check out the for-iago branch of my mesa
repository [1], I've ripped out everything specific to
ARB_shader_image_load_store, what is left is the code for
building and sending typed and untyped surface messages, along
with some dependencies.  It should make your task considerably
easier.  There are emit_untyped_read(), emit_untyped_write() and
emit_untyped_atomic() functions you can call passing the surface
index you want to access, the address and argument vectors,
number of address dimensions (typically one for SSBOs) and number
of 32-bit components to write or read.  It will give you the
result (if any) in a register returned by the function.  It
should take care of most hardware quirks for you (SIMD mode
restrictions, differences in header and vector layout across
generations).

[1] http://cgit.freedesktop.org/~currojerez/mesa/log/?h=for-iago

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150509/a9fb8966/attachment.html>