[Bug 89597] Implement SSBOs in GLSL front-end and i965

Fri May 8 01:08:38 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #53 from Iago Toral <itoral at igalia.com> ---
(In reply to Iago Toral from comment #52)
> (In reply to Francisco Jerez from comment #51)
> > (In reply to Iago Toral from comment #50)
> > > Francisco, your implementation of brw_untyped_surface_write in brw_eu_emit.c
> > > fixes the mask to use for the dst, which in turn decides which channels are
> > > effectively written to memory.
> > > 
> > > For vec4 and gen7 (except haswell) it sets the writemask to WRITEMASK_X, but
> > > I think we want clients of this function to decide the writemask to use. For
> > > example, if we are writing a vec4, I want to use WRITEMASK_XYZW (even if we
> > > are using SIMD8 mode) to get all 4 components written directly. Fixing the
> > > writemask inside this function seems a bit restrictive.
> > > 
> > 
> > IVB didn't have a SIMD4x2 variant of the untyped surface write
> > message, so the SIMD8 one has to be used.  The writemask is
> > required because the dataport will reinterpret the execution mask
> > sent by the EU as if each bit mapped to a separate scalar
> > channel, just like is the case for a FS thread, so, if you set
> > the writemask to XYZW the dataport might end up writing *eight*
> > separate vec4s to memory, which is almost certainly not what you
> > want.
> 
> I think this should not happen. The IVB PRMs, 3.9.9.10
> Message Payload, says:
> 
> "For SIMD16 and SIMD8 messages, the message length is used to determine how
> may address parameters are included in the message. The number of message
> registers in the write data payload is determined by the number of channel
> mask bits that are enabled"
> 
> So, if we only enable one channel (red), it should only write up to 8
> dwords, never 8 vec4s. With this in mind, this is what I am doing:
> 
> I write up to 8 offsets to M1 and up to 8 values to M2 (so red channel
> only). The first 4 values in the red channel (M2.0 to M2.3) are the four
> vector components of the vertex stored in the lower half of the SIMD4x2
> execution, the data from second vertex of the SIMD4x2 execution goes in M2.4
> to M2.7. Since I only provide data for the red channel, the message can only
> write up to 8 dwords, no matter the writemask I use. Then, with WRITEMASK_X,
> only M2.0 and M.2.4 get written. With WRITEMASK_XYZW I get all 8 dwords
> written, with other writetemasks I can get any subset I need, which works
> great because I only need to pass the writemask we get from the GLSL IR as
> is to get exactly what we want.
> 
> I have tested this in multiple scenarios and seems to work fine in all of
> them, and the implementation is straight forward. Do you see any issues that
> I might be missing?

There is another benefit of this implementation that I have just noticed: the
structure of the payload I use in SIMD8 mode for IVB is the same as the
structure of the payload for haswell in SIMD4x2, which means that the
implementation is the same for both (if anything I can optimize the haswell
version since it only needs M1.0 and M1.4 in the address payload). If I change
the implementation to do as you suggested then I would need to write different
implementations for both systems, right?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150508/a17bee59/attachment.html>