[Bug 89597] Implement SSBOs in GLSL front-end and i965
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Thu Apr 16 23:18:29 PDT 2015
https://bugs.freedesktop.org/show_bug.cgi?id=89597
--- Comment #15 from Iago Toral <itoral at igalia.com> ---
(In reply to Jason Ekstrand from comment #14)
> (In reply to Iago Toral from comment #13)
> > Jason, scattered writes did fix the problem, thanks!
> >
> > I noticed an unexpected behavior though, according to the PRM, the scattered
> > write message is supposed to write 8 DWords at 8 offsets (for a block size
> > of 8), however, for me it only writes 4. It completely ignores offsets
> > stored in M1.4:M1.7 and data stored in M2.4:M2.7 of the message payload.
>
> I'm not sure if this is your problem, but something that took me by
> surprised about the scattered read/write messages is that they don't do what
> you might first expect. The 8 dwords are written to the 8 different offsets
> provided. This means that, if all 8 offsets are the same, one of those 8
> values will end up there and the other 7 won't get written at all. If you
> want to use it (as I did to spill an entire register), you have to give it 8
> different offsets. I did this using an add with a vector int immediate:
>
> http://cgit.freedesktop.org/~jekstrand/mesa/tree/src/mesa/drivers/dri/i965/
> brw_fs.cpp?h=wip/fs-indirects-v0.5#n1740
>
> For SSBO's, however, scattered read/write should be exactly what you want
> because because you get an offset per SIMD channel and you just have to put
> the data there. The user is responsible for making sure that data from
> different fragments or vertices end up in different locations.
Nope, that is not my problem. I provide 8 different consecutive dword offets
but I only see 4 of these actually written.
> > This issue actually works great for me here because a vector type is at most
> > 4 elements so we want to write 4 DWords tops with each message, but I wonder
> > why this this happening and if it is safe to assume that it is going to
> > write 4 Dwords always. The PRM says that the hardware uses the 8 lower bits
> > of the execution mask to select which of the 8 channels are effectively
> > written, so I wonder if that could be affecting here or if this issue might
> > be related to something else.
> >
> > Any thoughts?
>
> I'm confused. Are you trying to use dword scattered read/write for vec4?
> In SIMD8, you only write one float at a time anyway. Unless, of course, I'm
> massively misunderstanding SSBO's. For vec4, I think you want the
Nope, this is SIMD8/16, haven't tried to use this with vec4. The thing is,
imagine that I have a vector type at the IR element with element count > 1.
Initially I would loop through the elements and write each one individually by
passing offset(value_reg, i) as src to the write message, but then I noticed
that I could use the same message to write all the elements in the vector (up
to 4) in one go if I provided 4 different offsets to the scattered message and
prepared the message payload with the 4 floats to write at each offset. That
is, I do something like this in the visitor:
/* Prepare scattered write message payload.
* M1.0..M1.3: Dword offsets to be added to the global offset
* M2.0..M2.3: Dword values
*/
int base_mrf = 1;
for (int i = 0; i < ir->val->type->vector_elements; i++) {
int component_mask = 1 << i;
if (ir->write_mask & component_mask) {
fs_reg mrf = fs_reg(MRF, base_mrf + 1, BRW_REGISTER_TYPE_UD);
mrf.subreg_offset += i * type_sz(mrf.type);
emit(MOV(mrf, brw_imm_ud(i)));
mrf = fs_reg(MRF, base_mrf + 2, val_reg.type);
mrf.subreg_offset += i * type_sz(mrf.type);
emit(MOV(mrf, offset(val_reg, i)));
}
}
/* Set the writemask so we only write to the offsets we want */
struct brw_reg brw_dst =
brw_set_writemask(brw_vec8_grf(0, 0), ir->write_mask);
fs_reg push_dst = fs_reg(brw_dst);
fs_inst *inst =
new(mem_ctx) fs_inst(SHADER_OPCODE_SCATTERED_BUFFER_STORE, 8,
push_dst, surf_index, offset_reg);
This seems to work well, and for vectors I end up only needing one message to
write all the channels I need to write. Now that I think about it, the reason I
only get 4 channels written at most is probably because ir->write_mask can be
0xf at most, I imagine that in SIMD8 the wridst temask would have to be 0xff to
cover all 8 channels, unlike vec4.
> > This is important because if I can't be sure that only 4 Dwords are going to
> > be written then I need to disable the writes from offsets M1.4:M1.7. Ideally
> > I would do this by altering the execution mask for the SEND instruction so
> > that it only considers the the channels we want to write. Is this possible?
> > I have not found any examples in the driver where this is done.
> >
> > Alternatively, I could replicate the writes from offsets 0..3 into 4..7 (the
> > PRM says that the hardware optimizes writes to the same offset so this may
> > not be that bad).
>
> If you're trying to use scattered read/write in vec4, then you may be
> running into execution mask issues. I don't know how the execution mask in
> 4x2 is laid out but scattered read/write is usually a SIMD8 message. It can
> be used in 4x2 mode but you'll have to monkey with the writemask yourself.
> I'm not sure how you do that. Ken would know.
Haven't tried this for vec4 yet, but if I end up needing it there too I'll ask
Ken.
Thanks Jason.
--
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150417/25641139/attachment.html>
More information about the intel-3d-bugs
mailing list