[Bug 89597] Implement SSBOs in GLSL front-end and i965

Thu Apr 16 23:18:29 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #15 from Iago Toral <itoral at igalia.com> ---
(In reply to Jason Ekstrand from comment #14)
> (In reply to Iago Toral from comment #13)
> > Jason, scattered writes did fix the problem, thanks!
> > 
> > I noticed an unexpected behavior though, according to the PRM, the scattered
> > write message is supposed to write 8 DWords at 8 offsets (for a block size
> > of 8), however, for me it only writes 4. It completely ignores offsets
> > stored in M1.4:M1.7 and data stored in M2.4:M2.7 of the message payload.
> 
> I'm not sure if this is your problem, but something that took me by
> surprised about the scattered read/write messages is that they don't do what
> you might first expect.  The 8 dwords are written to the 8 different offsets
> provided.  This means that, if all 8 offsets are the same, one of those 8
> values will end up there and the other 7 won't get written at all.  If you
> want to use it (as I did to spill an entire register), you have to give it 8
> different offsets.  I did this using an add with a vector int immediate:
> 
> http://cgit.freedesktop.org/~jekstrand/mesa/tree/src/mesa/drivers/dri/i965/
> brw_fs.cpp?h=wip/fs-indirects-v0.5#n1740
> 
> For SSBO's, however, scattered read/write should be exactly what you want
> because because you get an offset per SIMD channel and you just have to put
> the data there.  The user is responsible for making sure that data from
> different fragments or vertices end up in different locations.

Nope, that is not my problem. I provide 8 different consecutive dword offets
but I only see 4 of these actually written.

> > This issue actually works great for me here because a vector type is at most
> > 4 elements so we want to write 4 DWords tops with each message, but I wonder
> > why this this happening  and if it is safe to assume that it is going to
> > write 4 Dwords always. The PRM says that the hardware uses the 8 lower bits
> > of the execution mask to select which of the 8 channels are effectively
> > written, so I wonder if that could be affecting here or if this issue might
> > be related to something else.
> > 
> > Any thoughts?
> 
> I'm confused.  Are you trying to use dword scattered read/write for vec4? 
> In SIMD8, you only write one float at a time anyway.  Unless, of course, I'm
> massively misunderstanding SSBO's.  For vec4, I think you want the 

Nope, this is SIMD8/16, haven't tried to use this with vec4. The thing is,
imagine that I have a vector type at the IR element with element count > 1.
Initially I would loop through the elements and write each one individually by
passing offset(value_reg, i) as src to the write message, but then I noticed
that I could use the same message to write all the elements in the vector (up
to 4) in one go if I provided 4 different offsets to the scattered message and
prepared the message payload with the 4 floats to write at each offset. That
is, I do something like this in the visitor:

   /* Prepare scattered write message payload.
    * M1.0..M1.3: Dword offsets to be added to the global offset
    * M2.0..M2.3: Dword values
    */
   int base_mrf = 1;
   for (int i = 0; i < ir->val->type->vector_elements; i++) {
      int component_mask = 1 << i;
      if (ir->write_mask & component_mask) {
         fs_reg mrf = fs_reg(MRF, base_mrf + 1, BRW_REGISTER_TYPE_UD);
         mrf.subreg_offset += i * type_sz(mrf.type);
         emit(MOV(mrf, brw_imm_ud(i)));

         mrf = fs_reg(MRF, base_mrf + 2, val_reg.type);
         mrf.subreg_offset += i * type_sz(mrf.type);
         emit(MOV(mrf, offset(val_reg, i)));
      }
   }

   /* Set the writemask so we only write to the offsets we want */
   struct brw_reg brw_dst =
      brw_set_writemask(brw_vec8_grf(0, 0), ir->write_mask);
   fs_reg push_dst = fs_reg(brw_dst);
   fs_inst *inst =
      new(mem_ctx) fs_inst(SHADER_OPCODE_SCATTERED_BUFFER_STORE, 8,
                           push_dst, surf_index, offset_reg);

This seems to work well, and for vectors I end up only needing one message to
write all the channels I need to write. Now that I think about it, the reason I
only get 4 channels written at most is probably because ir->write_mask can be
0xf at most, I imagine that in SIMD8 the wridst temask would have to be 0xff to
cover all 8 channels, unlike vec4.

> > This is important because if I can't be sure that only 4 Dwords are going to
> > be written then I need to disable the writes from offsets M1.4:M1.7. Ideally
> > I would do this by altering the execution mask for the SEND instruction so
> > that it only considers the the channels we want to write. Is this possible?
> > I have not found any examples in the driver where this is done.
> > 
> > Alternatively, I could replicate the writes from offsets 0..3 into 4..7 (the
> > PRM says that the hardware optimizes writes to the same offset so this may
> > not be that bad).
> 
> If you're trying to use scattered read/write in vec4, then you may be
> running into execution mask issues.  I don't know how the execution mask in
> 4x2 is laid out but scattered read/write is usually a SIMD8 message.  It can
> be used in 4x2 mode but you'll have to monkey with the writemask yourself. 
> I'm not sure how you do that.  Ken would know.

Haven't tried this for vec4 yet, but if I end up needing it there too I'll ask
Ken.

Thanks Jason.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150417/25641139/attachment.html>