[Bug 89597] Implement SSBOs in GLSL front-end and i965

Thu May 7 23:38:50 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #52 from Iago Toral <itoral at igalia.com> ---
(In reply to Francisco Jerez from comment #51)
> (In reply to Iago Toral from comment #50)
> > Francisco, your implementation of brw_untyped_surface_write in brw_eu_emit.c
> > fixes the mask to use for the dst, which in turn decides which channels are
> > effectively written to memory.
> > 
> > For vec4 and gen7 (except haswell) it sets the writemask to WRITEMASK_X, but
> > I think we want clients of this function to decide the writemask to use. For
> > example, if we are writing a vec4, I want to use WRITEMASK_XYZW (even if we
> > are using SIMD8 mode) to get all 4 components written directly. Fixing the
> > writemask inside this function seems a bit restrictive.
> > 
> 
> IVB didn't have a SIMD4x2 variant of the untyped surface write
> message, so the SIMD8 one has to be used.  The writemask is
> required because the dataport will reinterpret the execution mask
> sent by the EU as if each bit mapped to a separate scalar
> channel, just like is the case for a FS thread, so, if you set
> the writemask to XYZW the dataport might end up writing *eight*
> separate vec4s to memory, which is almost certainly not what you
> want.

I think this should not happen. The IVB PRMs, 3.9.9.10
Message Payload, says:

"For SIMD16 and SIMD8 messages, the message length is used to determine how may
address parameters are included in the message. The number of message registers
in the write data payload is determined by the number of channel mask bits that
are enabled"

So, if we only enable one channel (red), it should only write up to 8 dwords,
never 8 vec4s. With this in mind, this is what I am doing:

I write up to 8 offsets to M1 and up to 8 values to M2 (so red channel only).
The first 4 values in the red channel (M2.0 to M2.3) are the four vector
components of the vertex stored in the lower half of the SIMD4x2 execution, the
data from second vertex of the SIMD4x2 execution goes in M2.4 to M2.7. Since I
only provide data for the red channel, the message can only write up to 8
dwords, no matter the writemask I use. Then, with WRITEMASK_X, only M2.0 and
M.2.4 get written. With WRITEMASK_XYZW I get all 8 dwords written, with other
writetemasks I can get any subset I need, which works great because I only need
to pass the writemask we get from the GLSL IR as is to get exactly what we
want.

I have tested this in multiple scenarios and seems to work fine in all of them,
and the implementation is straight forward. Do you see any issues that I might
be missing?

> > Would you be okay with the addition of a dst parameter to this function? The
> > clients (generator or visitor) would be responsible for setting the
> > appropriate writemask on that dst depending on the situation.
> > 
> 
> It shouldn't be necessary.  The untyped surface write message
> already stores as much data as you need (2 vec4s) with the
> writemask set to one component.  The problem is that the way it
> expects vectors to be laid out in the payload is somewhat
> annoying: each component of the argument needs to be passed in a
> separate register, just like in SIMD8 mode, with the value for
> the first and second channels stored in the first and fourth
> 32-bit elements of each register respectively.

Right, but as you explain, this seems a bit less direct that what I am doing
now. As I explain above, I can get this to work by writing all my data elements
to just one register, which I think makes the implementation a bit simpler if
there are no other caveats that I may be overlooking.

> To solve this problem I wrote some helper functions (emit_insert
> and emit_extract) while implementing ARB_shader_image_load_store
> which might be useful for you as starting point to convert
> between the native vector layout of the EU and the vector layout
> of a message payload.  See [1], but note that the branch is
> somewhat outdated.
> 
> > Since we are using WRITEMASK_XYZW for all FS invocations I think in that
> 
> The writemask doesn't exist at all in Align1 instructions (which is
> what the FS back-end uses).
> 
> > case we can have the generator set the dst parameter as 
> > brw_writemask(brw_null_reg(), WRITEMASK_XYZW), but for vec4 the  visitor
> > would do the same, but injecting the writemask it needs depending on the
> > case, then the vec4 generator would just pass that dst to
> > brw_untyped_surface_write. I was also thinking about passing the writemask
> > as a separate parameter to the generator opcode, but it looks like generator
> > opcodes can only take 3 src arguments and we are already using them, so
> > passing it as part of the dst seems like the best way to go.
> 
> [1]
> http://cgit.freedesktop.org/~currojerez/mesa/tree/src/mesa/drivers/dri/i965/
> brw_ir_surface_builder.h?h=image-load-store#n186

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150508/775910e7/attachment.html>