<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Implement SSBOs in GLSL front-end and i965" href="https://bugs.freedesktop.org/show_bug.cgi?id=89597#c15">Comment # 15</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Implement SSBOs in GLSL front-end and i965" href="https://bugs.freedesktop.org/show_bug.cgi?id=89597">bug 89597</a> from <a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> Iago Toral</a> <pre>(In reply to Jason Ekstrand from <a href="show_bug.cgi?id=89597#c14">comment #14</a>) > (In reply to Iago Toral from <a href="show_bug.cgi?id=89597#c13">comment #13</a>) > > Jason, scattered writes did fix the problem, thanks! > > > > I noticed an unexpected behavior though, according to the PRM, the scattered > > write message is supposed to write 8 DWords at 8 offsets (for a block size > > of 8), however, for me it only writes 4. It completely ignores offsets > > stored in M1.4:M1.7 and data stored in M2.4:M2.7 of the message payload. > > I'm not sure if this is your problem, but something that took me by > surprised about the scattered read/write messages is that they don't do what > you might first expect. The 8 dwords are written to the 8 different offsets > provided. This means that, if all 8 offsets are the same, one of those 8 > values will end up there and the other 7 won't get written at all. If you > want to use it (as I did to spill an entire register), you have to give it 8 > different offsets. I did this using an add with a vector int immediate: > > <a href="http://cgit.freedesktop.org/~jekstrand/mesa/tree/src/mesa/drivers/dri/i965/">http://cgit.freedesktop.org/~jekstrand/mesa/tree/src/mesa/drivers/dri/i965/</a> > brw_fs.cpp?h=wip/fs-indirects-v0.5#n1740 > > For SSBO's, however, scattered read/write should be exactly what you want > because because you get an offset per SIMD channel and you just have to put > the data there. The user is responsible for making sure that data from > different fragments or vertices end up in different locations. Nope, that is not my problem. I provide 8 different consecutive dword offets but I only see 4 of these actually written. > > This issue actually works great for me here because a vector type is at most > > 4 elements so we want to write 4 DWords tops with each message, but I wonder > > why this this happening and if it is safe to assume that it is going to > > write 4 Dwords always. The PRM says that the hardware uses the 8 lower bits > > of the execution mask to select which of the 8 channels are effectively > > written, so I wonder if that could be affecting here or if this issue might > > be related to something else. > > > > Any thoughts? > > I'm confused. Are you trying to use dword scattered read/write for vec4? > In SIMD8, you only write one float at a time anyway. Unless, of course, I'm > massively misunderstanding SSBO's. For vec4, I think you want the Nope, this is SIMD8/16, haven't tried to use this with vec4. The thing is, imagine that I have a vector type at the IR element with element count > 1. Initially I would loop through the elements and write each one individually by passing offset(value_reg, i) as src to the write message, but then I noticed that I could use the same message to write all the elements in the vector (up to 4) in one go if I provided 4 different offsets to the scattered message and prepared the message payload with the 4 floats to write at each offset. That is, I do something like this in the visitor: /* Prepare scattered write message payload. * M1.0..M1.3: Dword offsets to be added to the global offset * M2.0..M2.3: Dword values */ int base_mrf = 1; for (int i = 0; i < ir->val->type->vector_elements; i++) { int component_mask = 1 << i; if (ir->write_mask & component_mask) { fs_reg mrf = fs_reg(MRF, base_mrf + 1, BRW_REGISTER_TYPE_UD); mrf.subreg_offset += i * type_sz(mrf.type); emit(MOV(mrf, brw_imm_ud(i))); mrf = fs_reg(MRF, base_mrf + 2, val_reg.type); mrf.subreg_offset += i * type_sz(mrf.type); emit(MOV(mrf, offset(val_reg, i))); } } /* Set the writemask so we only write to the offsets we want */ struct brw_reg brw_dst = brw_set_writemask(brw_vec8_grf(0, 0), ir->write_mask); fs_reg push_dst = fs_reg(brw_dst); fs_inst *inst = new(mem_ctx) fs_inst(SHADER_OPCODE_SCATTERED_BUFFER_STORE, 8, push_dst, surf_index, offset_reg); This seems to work well, and for vectors I end up only needing one message to write all the channels I need to write. Now that I think about it, the reason I only get 4 channels written at most is probably because ir->write_mask can be 0xf at most, I imagine that in SIMD8 the wridst temask would have to be 0xff to cover all 8 channels, unlike vec4. > > This is important because if I can't be sure that only 4 Dwords are going to > > be written then I need to disable the writes from offsets M1.4:M1.7. Ideally > > I would do this by altering the execution mask for the SEND instruction so > > that it only considers the the channels we want to write. Is this possible? > > I have not found any examples in the driver where this is done. > > > > Alternatively, I could replicate the writes from offsets 0..3 into 4..7 (the > > PRM says that the hardware optimizes writes to the same offset so this may > > not be that bad). > > If you're trying to use scattered read/write in vec4, then you may be > running into execution mask issues. I don't know how the execution mask in > 4x2 is laid out but scattered read/write is usually a SIMD8 message. It can > be used in 4x2 mode but you'll have to monkey with the writemask yourself. > I'm not sure how you do that. Ken would know. Haven't tried this for vec4 yet, but if I end up needing it there too I'll ask Ken. Thanks Jason.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the QA Contact for the bug.</li> </ul> </body> </html>