[Bug 89597] Implement SSBOs in GLSL front-end and i965

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Apr 17 00:33:35 PDT 2015


https://bugs.freedesktop.org/show_bug.cgi?id=89597

--- Comment #16 from Jason Ekstrand <jason at jlekstrand.net> ---
(In reply to Iago Toral from comment #15)
> (In reply to Jason Ekstrand from comment #14)
> > I'm confused.  Are you trying to use dword scattered read/write for vec4? 
> > In SIMD8, you only write one float at a time anyway.  Unless, of course, I'm
> > massively misunderstanding SSBO's.  For vec4, I think you want the 
> 
> Nope, this is SIMD8/16, haven't tried to use this with vec4. The thing is,
> imagine that I have a vector type at the IR element with element count > 1.
> Initially I would loop through the elements and write each one individually
> by passing offset(value_reg, i) as src to the write message, but then I
> noticed that I could use the same message to write all the elements in the
> vector (up to 4) in one go if I provided 4 different offsets to the
> scattered message and prepared the message payload with the 4 floats to
> write at each offset. That is, I do something like this in the visitor:
> 
>    /* Prepare scattered write message payload.
>     * M1.0..M1.3: Dword offsets to be added to the global offset
>     * M2.0..M2.3: Dword values
>     */
>    int base_mrf = 1;
>    for (int i = 0; i < ir->val->type->vector_elements; i++) {
>       int component_mask = 1 << i;
>       if (ir->write_mask & component_mask) {
>          fs_reg mrf = fs_reg(MRF, base_mrf + 1, BRW_REGISTER_TYPE_UD);
>          mrf.subreg_offset += i * type_sz(mrf.type);
>          emit(MOV(mrf, brw_imm_ud(i)));
> 
>          mrf = fs_reg(MRF, base_mrf + 2, val_reg.type);
>          mrf.subreg_offset += i * type_sz(mrf.type);
>          emit(MOV(mrf, offset(val_reg, i)));
>       }
>    }
> 
>    /* Set the writemask so we only write to the offsets we want */
>    struct brw_reg brw_dst =
>       brw_set_writemask(brw_vec8_grf(0, 0), ir->write_mask);
>    fs_reg push_dst = fs_reg(brw_dst);
>    fs_inst *inst =
>       new(mem_ctx) fs_inst(SHADER_OPCODE_SCATTERED_BUFFER_STORE, 8,
>                            push_dst, surf_index, offset_reg);
> 
> This seems to work well, and for vectors I end up only needing one message
> to write all the channels I need to write. Now that I think about it, the
> reason I only get 4 channels written at most is probably because
> ir->write_mask can be 0xf at most, I imagine that in SIMD8 the wridst temask
> would have to be 0xff to cover all 8 channels, unlike vec4.

I think you are misunderstanding how these SIMD8/16 write messages work.  I'll
assume 8 in the following discussion but it all applies to 16.

As the shader executes, it is executes 8 pixels at a time.  Each sub-register
represents the same symbolic value in GLSL but for a different pixel.  Suppose
I have an SSBO declared as follows:

buffer Block {
    vec4 s[128];
};

And suppose I execute the line of code "s[i].xzw = foo;" where foo is some
vec3.  When the SIMD8 shader reaches this line, it stores 12 values in the
SSBO; 3 per pixel.  If the client doesn't want the values to stomp on each
other, it is up to the client to ensure that i is different for each pixel.

How does this work with the scattered read/write messages?  They are designed
for exactly a case like this.  When you get to this statement, you will have
one register that holds the value of i and three more for foo.  Each of these
registers has 8 sub-registers one for each SIMD channel (or pixel).  All you
should have to do is build 3 messages each one of which is i + some math for
the address part and a component of foo for the payload part.  Each scattered
write writes 8 values but they are the different values from the different SIMD
channels, not from different components of foo.  The first one will write all 8
of the s[i].x, the next one s[i].y, etc.

Does that make more sense?

> > If you're trying to use scattered read/write in vec4, then you may be
> > running into execution mask issues.  I don't know how the execution mask in
> > 4x2 is laid out but scattered read/write is usually a SIMD8 message.  It can
> > be used in 4x2 mode but you'll have to monkey with the writemask yourself. 
> > I'm not sure how you do that.  Ken would know.
> 
> Haven't tried this for vec4 yet, but if I end up needing it there too I'll
> ask Ken.

For vec4, the oword messages are *probably* what you want, but I'm not sure how
that plays with packing.  That said, I think it's probably best to get this
working for the FS backend as it's a good deal simpler there.  It also allows
us to enable it on BDW+ before you get it working in vec4.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150417/beb3842e/attachment.html>


More information about the intel-3d-bugs mailing list