<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Implement SSBOs in GLSL front-end and i965"
href="https://bugs.freedesktop.org/show_bug.cgi?id=89597#c21">Comment # 21</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Implement SSBOs in GLSL front-end and i965"
href="https://bugs.freedesktop.org/show_bug.cgi?id=89597">bug 89597</a>
from <span class="vcard"><a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> <span class="fn">Iago Toral</span></a>
</span></b>
<pre>(In reply to Jason Ekstrand from <a href="show_bug.cgi?id=89597#c20">comment #20</a>)
<span class="quote">> (In reply to Kristian Høgsberg from <a href="show_bug.cgi?id=89597#c19">comment #19</a>)
> > (In reply to Iago Toral from <a href="show_bug.cgi?id=89597#c18">comment #18</a>)
> > > Jason, I think I got it working for SIMD8 but I have a question regarding
> > > SIMD16:
> > >
> > > > (In reply to Jason Ekstrand from <a href="show_bug.cgi?id=89597#c16">comment #16</a>)
> > > > > I think you are misunderstanding how these SIMD8/16 write messages work.
> > > > > I'll assume 8 in the following discussion but it all applies to 16.
> > > > >
> > > > > As the shader executes, it is executes 8 pixels at a time. Each
> > > > > sub-register represents the same symbolic value in GLSL but for a different
> > > > > pixel. Suppose I have an SSBO declared as follows:
> > > > >
> > > > > buffer Block {
> > > > > vec4 s[128];
> > > > > };
> > > > >
> > > > > And suppose I execute the line of code "s[i].xzw = foo;" where foo is some
> > > > > vec3. When the SIMD8 shader reaches this line, it stores 12 values in the
> > > > > SSBO; 3 per pixel. If the client doesn't want the values to stomp on each
> > > > > other, it is up to the client to ensure that i is different for each pixel.
> > > > >
> > > > > How does this work with the scattered read/write messages? They are
> > > > > designed for exactly a case like this. When you get to this statement, you
> > > > > will have one register that holds the value of i and three more for foo.
> > > > > Each of these registers has 8 sub-registers one for each SIMD channel (or
> > > > > pixel).
> > >
> > > In SIMD16 the instructions operate on 16 elements, but I understand that
> > > registers still have 8 sub-registers, so this instruction:
> > >
> > > mov(16) g116<1>F 1.0F { align1 1H };
> > >
> > > is writing 1.0 in all sub-registers of g116 (8 elements) and all
> > > sub-registers of g117 (8 elements). Is this correct? If I am correct, then I
> > > would expect this assembly code for a SIMD16 scattered write to work:
> > >
> > > mov(8) g113<1>UD g0<8,8,1>UD { align1 WE_all 1Q compacted };
> > > mov(1) g113.2<1>UD 0x00000000UD { align1 WE_all compacted };
> > > mov(16) g114<1>UD g13<8,8,1>UD { align1 1H compacted };
> > > mov(16) g116<1>F 1.0F { align1 1H };
> > > send(16) g0<1>F g113<8,8,1>F
> > > data ( DC DWORD scatterd write, 1, 3) mlen 5 rlen 0 { align1 1H };
> > >
> > > The first mov(16) would write the offset payload to M1,M2 (g114,g115) and
> > > the second mov(16) would write the data payload to M3,M4 (g116,g117).
> > > However, I see that this does not produce correct writes into the buffer, I
> > > see writes to the correct offsets but with wrong data, so I guess I am
> > > understanding something wrong again?.
> > >
> > > For the record, this same code works fine if I make the second mov(16) write
> > > to g115 (like I do in SIMD8, where we want offsets in M1 and data in M2),
> > > but as far as my understanding goes, this should actually be incorrect for
> > > SIMD16.
> > >
> > > > > All you should have to do is build 3 messages each one of which is
> > > > > i + some math for the address part and a component of foo for the payload
> > > > > part. Each scattered write writes 8 values but they are the different
> > > > > values from the different SIMD channels, not from different components of
> > > > > foo. The first one will write all 8 of the s[i].x, the next one s[i].y, etc.
> >
> > Are you setting the block size in the message descriptor?
> >
> > Bits 9:8 should be
> >
> > 10: 8 DWords
> > 11: 16 DWords
>
> Yes, I think this is most likely the problem. We actually have a nice
> #define for this. You can see it in use in my wip/fs-indirects-v0.5 branch
> in this commit:
>
> <a href="http://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/fs-indirects-v0">http://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/fs-indirects-v0</a>.
> 5&id=df4293526f873102b45dd89dc20b084bc8662181
>
> In fact, feel free to just cherry-pick that if you think it's what you want.
> It also handles setting the right opcode for the different gens.</span >
Nope, that shouldn't be it, this is what I have:
int mlen, msg_type;
if (dispatch_width == 8) {
msg_type = BRW_DATAPORT_DWORD_SCATTERED_BLOCK_8DWORDS;
mlen = 3;
} else {
msg_type = BRW_DATAPORT_DWORD_SCATTERED_BLOCK_16DWORDS;
mlen = 5;
}
I use dispatch_width rather than the inst->exec_size to check if we are in
SIMD16 or SIMD8 mode, but both should be valid I guess. Anyway, if my
understanding of how things operate in SIMD16 mode is correct then there must
be something silly getting in the way, I'll try to track it down.
On a related note, I am trying to test the behavior for SIMD8/SIMD16 with a
fragment shader like this:
int index = int(mod(gl_FragCoord.x, 32));
data[index] = index;
so that I have each pixel write a different value to a different index. To my
surprise, if I always use the SIMD8 implementation (i.e. only write 8 offsets
to M1 and 8 values to M2 and use BRW_DATAPORT_DWORD_SCATTERED_BLOCK_8DWORDS),
the result is correct even for a SIMD16 execution (that is, I read data[i] == i
after rendering). Shouldn't this shader produce bogus results at least for some
of the indices in SIMD16 mode with this setup?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
</ul>
</body>
</html>