[Mesa-dev] Link failure when copying big arrays stored in SSBOs

Fri Nov 20 04:12:26 PST 2015

On Fri, 2015-11-20 at 13:07 +0100, Iago Toral wrote:
> Hi,
> 
> Jordan sent a piglit test that produces a link failure with the ssbo
> code [1]. Doing something like this is sufficient to reproduce the
> problem:
> 
> [fragment shader]
> #version 330
> #extension GL_ARB_shader_storage_buffer_object: require
> 
> #define SIZE 6
> 
> layout (std430) buffer SSBO {
>     mat4 m1[SIZE];
>     mat4 m2[SIZE];
> };
> 
> void main() {
>     m2 = m1;
> }
> 
> the thing here is that the lower_ubo_reference pass will first find that
> we read all of m1 and emit ssbo loads for each offset, then it will find
> the write to m2 and emit all the writes, one for each offset. That
> produces NIR code that looks like this:
> 
>         vec4 ssa_1 = intrinsic load_ssbo (ssa_0) () (0)
>         vec4 ssa_2 = intrinsic load_ssbo (ssa_0) () (16)
>         vec4 ssa_3 = intrinsic load_ssbo (ssa_0) () (32)
>         (...)
>         vec4 ssa_24 = intrinsic load_ssbo (ssa_0) () (368)
>         intrinsic store_ssbo (ssa_24, ssa_0) () (752, 15)
>         intrinsic store_ssbo (ssa_23, ssa_0) () (736, 15)
>         intrinsic store_ssbo (ssa_22, ssa_0) () (720, 15)
>         (...)
>         intrinsic store_ssbo (ssa_1, ssa_0) () (384, 15)
> 
> Down at the i965 level, the registers used to configure the loads are
> also used also to configure the writes (since they specify the address),
> which means that they are alive for the whole time between the read and
> the write to the same offset. For example:
> 
> {  7}    1: untyped_surface_read(8) (mlen: 1) vgrf95+2.0:UD, vgrf25:UD
> ... <other reads from m1> ...
> ... <writes to m2> ...
> {  6}  140: mov(8) vgrf95+0.0:UD, 0d NoMask
> {  6}  141: mov(8) vgrf95+0.28:UD, g1:UD NoMask
> {  6}  142: mov(8) vgrf95+1.0:UD, 384u
> {  6}  143: untyped_surface_write(8) (mlen: 6) null:UD, vgrf95:UD
> 
> In that code, vgrf95 is alive in ip=[1, 143]. The same goes for all the
> other offsets, so we just end up with too many live registers. In
> general, register pressure increases with each load and won't decrease
> until we start with the writes, so the larger the arrays get the worse
> the situation becomes.
> 
> I don't think we can do much about this other than maybe handling array
> copies specially (so that instead of emitting all the loads first and
> all the stores second, we emit the load and store for each element at
> once, reducing liveness for the registers involved. I am assuming that
> nobody would write structs big enough to generate the same problem
> there, but hey... :)

Actually, we'd need the same for struct copies, since we would run into
the same problem as soon as they include large arrays of course.

> Any better ideas?
> 
> Iago
> 
> [1]http://lists.freedesktop.org/archives/piglit/2015-November/018055.html