<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Implement SSBOs in GLSL front-end and i965"
href="https://bugs.freedesktop.org/show_bug.cgi?id=89597#c55">Comment # 55</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Implement SSBOs in GLSL front-end and i965"
href="https://bugs.freedesktop.org/show_bug.cgi?id=89597">bug 89597</a>
from <span class="vcard"><a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> <span class="fn">Iago Toral</span></a>
</span></b>
<pre>(In reply to Francisco Jerez from <a href="show_bug.cgi?id=89597#c54">comment #54</a>)
<span class="quote">> (In reply to Iago Toral from <a href="show_bug.cgi?id=89597#c53">comment #53</a>)
> > (In reply to Iago Toral from <a href="show_bug.cgi?id=89597#c52">comment #52</a>)
> > > (In reply to Francisco Jerez from <a href="show_bug.cgi?id=89597#c51">comment #51</a>)
> > > > (In reply to Iago Toral from <a href="show_bug.cgi?id=89597#c50">comment #50</a>)
> > > > > Francisco, your implementation of brw_untyped_surface_write in brw_eu_emit.c
> > > > > fixes the mask to use for the dst, which in turn decides which channels are
> > > > > effectively written to memory.
> > > > >
> > > > > For vec4 and gen7 (except haswell) it sets the writemask to WRITEMASK_X, but
> > > > > I think we want clients of this function to decide the writemask to use. For
> > > > > example, if we are writing a vec4, I want to use WRITEMASK_XYZW (even if we
> > > > > are using SIMD8 mode) to get all 4 components written directly. Fixing the
> > > > > writemask inside this function seems a bit restrictive.
> > > > >
> > > >
> > > > IVB didn't have a SIMD4x2 variant of the untyped surface write
> > > > message, so the SIMD8 one has to be used. The writemask is
> > > > required because the dataport will reinterpret the execution mask
> > > > sent by the EU as if each bit mapped to a separate scalar
> > > > channel, just like is the case for a FS thread, so, if you set
> > > > the writemask to XYZW the dataport might end up writing *eight*
> > > > separate vec4s to memory, which is almost certainly not what you
> > > > want.
> > >
> > > I think this should not happen. The IVB PRMs, 3.9.9.10
> > > Message Payload, says:
> > >
> > > "For SIMD16 and SIMD8 messages, the message length is used to determine how
> > > may address parameters are included in the message. The number of message
> > > registers in the write data payload is determined by the number of channel
> > > mask bits that are enabled"
> > >
> > > So, if we only enable one channel (red), it should only write up to 8
> > > dwords, never 8 vec4s. With this in mind, this is what I am doing:
> > >
> > > I write up to 8 offsets to M1 and up to 8 values to M2 (so red channel
> > > only). The first 4 values in the red channel (M2.0 to M2.3) are the four
> > > vector components of the vertex stored in the lower half of the SIMD4x2
> > > execution, the data from second vertex of the SIMD4x2 execution goes in M2.4
> > > to M2.7. Since I only provide data for the red channel, the message can only
> > > write up to 8 dwords, no matter the writemask I use. Then, with WRITEMASK_X,
> > > only M2.0 and M.2.4 get written. With WRITEMASK_XYZW I get all 8 dwords
> > > written, with other writetemasks I can get any subset I need, which works
> > > great because I only need to pass the writemask we get from the GLSL IR as
> > > is to get exactly what we want.
> > >
> > > I have tested this in multiple scenarios and seems to work fine in all of
> > > them, and the implementation is straight forward. Do you see any issues that
> > > I might be missing?
> >
> > There is another benefit of this implementation that I have just noticed:
> > the structure of the payload I use in SIMD8 mode for IVB is the same as the
> > structure of the payload for haswell in SIMD4x2, which means that the
> > implementation is the same for both (if anything I can optimize the haswell
> > version since it only needs M1.0 and M1.4 in the address payload). If I
> > change the implementation to do as you suggested then I would need to write
> > different implementations for both systems, right?
>
> No, not necessarily. In fact in the link I shared earlier there
> is a single implementation of untyped surface write shared among
> all generations which works regardless of whether SIMD8, 16 or
> 4x2 is being used. The only reason why that's possible despite
> the strange vector layout of the message on IVB is that both the
> X-only SIMD8 message and the HSW SIMD4x2 messages have the exact
> same semantics -- IOW they take and return the same set of values
> up to a transposition, which can be applied consistently to all
> vector values based on the has_simd4x2 flag, in a manner
> transparent for the implementation of the typed and untyped
> surface messages.
>
> Even though your idea would work in this specific case it can't
> be extended to the typed messages (because by tweaking the
> coordinates you get the same single color component from
> different locations of the image rather than different color
> components), and it breaks the symmetry (up to a transpose)
> between the FS and VEC4 implementations and between the HSW+ and
> IVB implementations.</span >
I see, thanks again for taking the time to explain this. I'll rewrite my
solution to use SIMD8 messages with WRITEMASK_X only so we can keep that
symmetry.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
</ul>
</body>
</html>