[Mesa-dev] Link failure when copying big arrays stored in SSBOs

Iago Toral itoral at igalia.com
Fri Nov 20 04:07:03 PST 2015


Hi,

Jordan sent a piglit test that produces a link failure with the ssbo
code [1]. Doing something like this is sufficient to reproduce the
problem:

[fragment shader]
#version 330
#extension GL_ARB_shader_storage_buffer_object: require

#define SIZE 6

layout (std430) buffer SSBO {
    mat4 m1[SIZE];
    mat4 m2[SIZE];
};

void main() {
    m2 = m1;
}

the thing here is that the lower_ubo_reference pass will first find that
we read all of m1 and emit ssbo loads for each offset, then it will find
the write to m2 and emit all the writes, one for each offset. That
produces NIR code that looks like this:

        vec4 ssa_1 = intrinsic load_ssbo (ssa_0) () (0)
        vec4 ssa_2 = intrinsic load_ssbo (ssa_0) () (16)
        vec4 ssa_3 = intrinsic load_ssbo (ssa_0) () (32)
        (...)
        vec4 ssa_24 = intrinsic load_ssbo (ssa_0) () (368)
        intrinsic store_ssbo (ssa_24, ssa_0) () (752, 15)
        intrinsic store_ssbo (ssa_23, ssa_0) () (736, 15)
        intrinsic store_ssbo (ssa_22, ssa_0) () (720, 15)
        (...)
        intrinsic store_ssbo (ssa_1, ssa_0) () (384, 15)

Down at the i965 level, the registers used to configure the loads are
also used also to configure the writes (since they specify the address),
which means that they are alive for the whole time between the read and
the write to the same offset. For example:

{  7}    1: untyped_surface_read(8) (mlen: 1) vgrf95+2.0:UD, vgrf25:UD
... <other reads from m1> ...
... <writes to m2> ...
{  6}  140: mov(8) vgrf95+0.0:UD, 0d NoMask
{  6}  141: mov(8) vgrf95+0.28:UD, g1:UD NoMask
{  6}  142: mov(8) vgrf95+1.0:UD, 384u
{  6}  143: untyped_surface_write(8) (mlen: 6) null:UD, vgrf95:UD

In that code, vgrf95 is alive in ip=[1, 143]. The same goes for all the
other offsets, so we just end up with too many live registers. In
general, register pressure increases with each load and won't decrease
until we start with the writes, so the larger the arrays get the worse
the situation becomes.

I don't think we can do much about this other than maybe handling array
copies specially (so that instead of emitting all the loads first and
all the stores second, we emit the load and store for each element at
once, reducing liveness for the registers involved. I am assuming that
nobody would write structs big enough to generate the same problem
there, but hey... :)

Any better ideas?

Iago

[1]http://lists.freedesktop.org/archives/piglit/2015-November/018055.html



More information about the mesa-dev mailing list