[Bug 89597] Implement SSBOs in GLSL front-end and i965

Fri Apr 17 00:56:59 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #17 from Iago Toral <itoral at igalia.com> ---
(In reply to Jason Ekstrand from comment #16)
> (In reply to Iago Toral from comment #15)
> > (In reply to Jason Ekstrand from comment #14)
> > > I'm confused.  Are you trying to use dword scattered read/write for vec4? 
> > > In SIMD8, you only write one float at a time anyway.  Unless, of course, I'm
> > > massively misunderstanding SSBO's.  For vec4, I think you want the 
> > 
> > Nope, this is SIMD8/16, haven't tried to use this with vec4. The thing is,
> > imagine that I have a vector type at the IR element with element count > 1.
> > Initially I would loop through the elements and write each one individually
> > by passing offset(value_reg, i) as src to the write message, but then I
> > noticed that I could use the same message to write all the elements in the
> > vector (up to 4) in one go if I provided 4 different offsets to the
> > scattered message and prepared the message payload with the 4 floats to
> > write at each offset. That is, I do something like this in the visitor:
> > 
> >    /* Prepare scattered write message payload.
> >     * M1.0..M1.3: Dword offsets to be added to the global offset
> >     * M2.0..M2.3: Dword values
> >     */
> >    int base_mrf = 1;
> >    for (int i = 0; i < ir->val->type->vector_elements; i++) {
> >       int component_mask = 1 << i;
> >       if (ir->write_mask & component_mask) {
> >          fs_reg mrf = fs_reg(MRF, base_mrf + 1, BRW_REGISTER_TYPE_UD);
> >          mrf.subreg_offset += i * type_sz(mrf.type);
> >          emit(MOV(mrf, brw_imm_ud(i)));
> > 
> >          mrf = fs_reg(MRF, base_mrf + 2, val_reg.type);
> >          mrf.subreg_offset += i * type_sz(mrf.type);
> >          emit(MOV(mrf, offset(val_reg, i)));
> >       }
> >    }
> > 
> >    /* Set the writemask so we only write to the offsets we want */
> >    struct brw_reg brw_dst =
> >       brw_set_writemask(brw_vec8_grf(0, 0), ir->write_mask);
> >    fs_reg push_dst = fs_reg(brw_dst);
> >    fs_inst *inst =
> >       new(mem_ctx) fs_inst(SHADER_OPCODE_SCATTERED_BUFFER_STORE, 8,
> >                            push_dst, surf_index, offset_reg);
> > 
> > This seems to work well, and for vectors I end up only needing one message
> > to write all the channels I need to write. Now that I think about it, the
> > reason I only get 4 channels written at most is probably because
> > ir->write_mask can be 0xf at most, I imagine that in SIMD8 the wridst temask
> > would have to be 0xff to cover all 8 channels, unlike vec4.
> 
> I think you are misunderstanding how these SIMD8/16 write messages work. 
> I'll assume 8 in the following discussion but it all applies to 16.
> 
> As the shader executes, it is executes 8 pixels at a time.  Each
> sub-register represents the same symbolic value in GLSL but for a different
> pixel.  Suppose I have an SSBO declared as follows:
> 
> buffer Block {
>     vec4 s[128];
> };
> 
> And suppose I execute the line of code "s[i].xzw = foo;" where foo is some
> vec3.  When the SIMD8 shader reaches this line, it stores 12 values in the
> SSBO; 3 per pixel.  If the client doesn't want the values to stomp on each
> other, it is up to the client to ensure that i is different for each pixel.
> 
> How does this work with the scattered read/write messages?  They are
> designed for exactly a case like this.  When you get to this statement, you
> will have one register that holds the value of i and three more for foo. 
> Each of these registers has 8 sub-registers one for each SIMD channel (or
> pixel).  All you should have to do is build 3 messages each one of which is
> i + some math for the address part and a component of foo for the payload
> part.  Each scattered write writes 8 values but they are the different
> values from the different SIMD channels, not from different components of
> foo.  The first one will write all 8 of the s[i].x, the next one s[i].y, etc.
> 
> Does that make more sense?

It does, thanks for the detailed explanation! I'll revert the implementation to
what I had before then. I suppose what I have now works simply because all
pixels are writing the same value to the same offset...

> > > If you're trying to use scattered read/write in vec4, then you may be
> > > running into execution mask issues.  I don't know how the execution mask in
> > > 4x2 is laid out but scattered read/write is usually a SIMD8 message.  It can
> > > be used in 4x2 mode but you'll have to monkey with the writemask yourself. 
> > > I'm not sure how you do that.  Ken would know.
> > 
> > Haven't tried this for vec4 yet, but if I end up needing it there too I'll
> > ask Ken.
> 
> For vec4, the oword messages are *probably* what you want, but I'm not sure
> how that plays with packing.  That said, I think it's probably best to get
> this working for the FS backend as it's a good deal simpler there.  It also
> allows us to enable it on BDW+ before you get it working in vec4.

Sure, will focus on that first. Thanks again Jason!

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150417/c9e93514/attachment-0001.html>