<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Implement SSBOs in GLSL front-end and i965" href="https://bugs.freedesktop.org/show_bug.cgi?id=89597#c55">Comment # 55</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Implement SSBOs in GLSL front-end and i965" href="https://bugs.freedesktop.org/show_bug.cgi?id=89597">bug 89597</a> from <a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> Iago Toral</a> <pre>(In reply to Francisco Jerez from <a href="show_bug.cgi?id=89597#c54">comment #54</a>) > (In reply to Iago Toral from <a href="show_bug.cgi?id=89597#c53">comment #53</a>) > > (In reply to Iago Toral from <a href="show_bug.cgi?id=89597#c52">comment #52</a>) > > > (In reply to Francisco Jerez from <a href="show_bug.cgi?id=89597#c51">comment #51</a>) > > > > (In reply to Iago Toral from <a href="show_bug.cgi?id=89597#c50">comment #50</a>) > > > > > Francisco, your implementation of brw_untyped_surface_write in brw_eu_emit.c > > > > > fixes the mask to use for the dst, which in turn decides which channels are > > > > > effectively written to memory. > > > > > > > > > > For vec4 and gen7 (except haswell) it sets the writemask to WRITEMASK_X, but > > > > > I think we want clients of this function to decide the writemask to use. For > > > > > example, if we are writing a vec4, I want to use WRITEMASK_XYZW (even if we > > > > > are using SIMD8 mode) to get all 4 components written directly. Fixing the > > > > > writemask inside this function seems a bit restrictive. > > > > > > > > > > > > > IVB didn't have a SIMD4x2 variant of the untyped surface write > > > > message, so the SIMD8 one has to be used. The writemask is > > > > required because the dataport will reinterpret the execution mask > > > > sent by the EU as if each bit mapped to a separate scalar > > > > channel, just like is the case for a FS thread, so, if you set > > > > the writemask to XYZW the dataport might end up writing *eight* > > > > separate vec4s to memory, which is almost certainly not what you > > > > want. > > > > > > I think this should not happen. The IVB PRMs, 3.9.9.10 > > > Message Payload, says: > > > > > > "For SIMD16 and SIMD8 messages, the message length is used to determine how > > > may address parameters are included in the message. The number of message > > > registers in the write data payload is determined by the number of channel > > > mask bits that are enabled" > > > > > > So, if we only enable one channel (red), it should only write up to 8 > > > dwords, never 8 vec4s. With this in mind, this is what I am doing: > > > > > > I write up to 8 offsets to M1 and up to 8 values to M2 (so red channel > > > only). The first 4 values in the red channel (M2.0 to M2.3) are the four > > > vector components of the vertex stored in the lower half of the SIMD4x2 > > > execution, the data from second vertex of the SIMD4x2 execution goes in M2.4 > > > to M2.7. Since I only provide data for the red channel, the message can only > > > write up to 8 dwords, no matter the writemask I use. Then, with WRITEMASK_X, > > > only M2.0 and M.2.4 get written. With WRITEMASK_XYZW I get all 8 dwords > > > written, with other writetemasks I can get any subset I need, which works > > > great because I only need to pass the writemask we get from the GLSL IR as > > > is to get exactly what we want. > > > > > > I have tested this in multiple scenarios and seems to work fine in all of > > > them, and the implementation is straight forward. Do you see any issues that > > > I might be missing? > > > > There is another benefit of this implementation that I have just noticed: > > the structure of the payload I use in SIMD8 mode for IVB is the same as the > > structure of the payload for haswell in SIMD4x2, which means that the > > implementation is the same for both (if anything I can optimize the haswell > > version since it only needs M1.0 and M1.4 in the address payload). If I > > change the implementation to do as you suggested then I would need to write > > different implementations for both systems, right? > > No, not necessarily. In fact in the link I shared earlier there > is a single implementation of untyped surface write shared among > all generations which works regardless of whether SIMD8, 16 or > 4x2 is being used. The only reason why that's possible despite > the strange vector layout of the message on IVB is that both the > X-only SIMD8 message and the HSW SIMD4x2 messages have the exact > same semantics -- IOW they take and return the same set of values > up to a transposition, which can be applied consistently to all > vector values based on the has_simd4x2 flag, in a manner > transparent for the implementation of the typed and untyped > surface messages. > > Even though your idea would work in this specific case it can't > be extended to the typed messages (because by tweaking the > coordinates you get the same single color component from > different locations of the image rather than different color > components), and it breaks the symmetry (up to a transpose) > between the FS and VEC4 implementations and between the HSW+ and > IVB implementations. I see, thanks again for taking the time to explain this. I'll rewrite my solution to use SIMD8 messages with WRITEMASK_X only so we can keep that symmetry.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the QA Contact for the bug.</li> </ul> </body> </html>