[Mesa-dev] Mesa (master): st/mesa: fix incorrect size of UBO declarations

Wed Jul 2 16:05:23 PDT 2014

On 07/02/2014 04:31 PM, Ian Romanick wrote:
> On 07/02/2014 07:29 AM, Brian Paul wrote:
>> On 07/02/2014 06:49 AM, Brian Paul wrote:
>>> On 07/02/2014 06:36 AM, Brian Paul wrote:
>>>> On 07/01/2014 08:43 PM, Michel Dänzer wrote:
>>>>> On 02.07.2014 00:43, Brian Paul wrote:
>>>>>> Module: Mesa
>>>>>> Branch: master
>>>>>> Commit: f4b0ab7afd83c811329211eae8167c9bf238870c
>>>>>> URL:
>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://cgit.freedesktop.org/mesa/mesa/commit/?id%3Df4b0ab7afd83c811329211eae8167c9bf238870c&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=lGQMzzTgII0I7jefp2FHq7WtZ%2BTLs8wadB%2BiIj9xpBY%3D%0A&m=X5dPXUSLXb%2B60RW5%2FHCgcIkPXz083MgfZAbiRJFVOxc%3D%0A&s=c9766b9e3797ccad30888c534fae5891f9a7313418a228c81c0528230412cf1c
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Author: Brian Paul <brianp at vmware.com>
>>>>>> Date:   Tue Jul  1 08:17:09 2014 -0600
>>>>>>
>>>>>> st/mesa: fix incorrect size of UBO declarations
>>>>>>
>>>>>> UniformBufferSize is in bytes so we need to divide by 16 to get the
>>>>>> number of constant buffer slots.  Also, the ureg_DECL_constant2D()
>>>>>> function takes first..last parameters so we need to subtract one
>>>>>> for the last value.
>>>>>
>>>>> This change broke the GLSL uniform_buffer fs/vs/gs-struct-pad piglit
>>>>> tests with radeonsi:
>>>>>
>>>>> ../../../../src/gallium/auxiliary/gallivm/lp_bld_tgsi.c:306:lp_build_emit_fetch:
>>>>>
>>>>>
>>>>> Assertion `reg->Register.Index <=
>>>>> bld_base->info->file_max[reg->Register.File]' failed.
>>>>>
>>>>> AFAICT reg->Register.Index is 2, and the new code calls
>>>>> ureg_DECL_constant2D() with last=1. This is the TGSI code:
>>>>>
>>>>> FRAG
>>>>> PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
>>>>> DCL OUT[0], COLOR
>>>>> DCL CONST[1][0..1]
>>>>> DCL TEMP[0], LOCAL
>>>>> IMM[0] UINT32 {0, 16, 32, 0}
>>>>>     0: MOV TEMP[0].x, CONST[1][0].xxxx
>>>>>     1: MOV TEMP[0].y, CONST[1][1].xxxx
>>>>>     2: MOV TEMP[0].z, CONST[1][2].xxxx
>>>>>     3: MOV TEMP[0].w, CONST[1][2].xxxx
>>>>>     4: MOV OUT[0], TEMP[0]
>>>>>     5: END
>>>>>
>>>>> which results from this GLSL code:
>>>>>
>>>>> #version 140
>>>>>
>>>>> struct S1 {
>>>>>           float r;
>>>>> };
>>>>>
>>>>> struct S2 {
>>>>>           float g;
>>>>>           float b;
>>>>>           float a;
>>>>> };
>>>>>
>>>>> struct S {
>>>>>          S1 s1;
>>>>>          S2 s2;
>>>>> };
>>>>>
>>>>> uniform ubo1 {
>>>>>           S s;
>>>>> };
>>>>>
>>>>> void main()
>>>>> {
>>>>>           gl_FragColor = vec4(s.s1.r, s.s2.g, s.s2.b, s.s2.a);
>>>>> }
>>>>>
>>>>>
>>>>> Something doesn't seem to add up, but I'm not sure where exactly the
>>>>> problem is or how to fix it.
>>>>
>>>>
>>>> I think there's at least a one GLSL compiler bug exposed here.
>>>>
>>>>   From the code fragments above, the z and w components of TEMP[0] are
>>>> getting the same value when they should not.
>>>>
>>>> I reverted Mesa to before my changes and modified the piglit test to use
>>>> different values for the blue/alpha channels and it fails (nvidia
>>>> passes).  We were just getting lucky before my changes.
>>>>
>>>> I suspect the code which is laying out the floats in the UBO is doing
>>>> something wrong, and the size of the UBO is miscalculated as a result.
>>>>
>>>> I'll dig a bit more but I think someone more familiar with the compiler
>>>> will need to help.
>>>
>>> The initial IR looks OK:
>>>
>>>
>>> (loop ...
>>>
>>>           (assign  (xyzw) (var_ref gl_Position)  (array_ref (var_ref
>>> vertex_to_gs) (var_ref i) ) )
>>>           (declare (temporary ) vec4 vec_ctor)
>>>           (assign  (x) (var_ref vec_ctor)  (record_ref (record_ref
>>> (var_ref s)  s1)  r) )
>>>           (assign  (y) (var_ref vec_ctor)  (record_ref (record_ref
>>> (var_ref s)  s2)  g) )
>>>           (assign  (z) (var_ref vec_ctor)  (record_ref (record_ref
>>> (var_ref s)  s2)  b) )
>>>           (assign  (w) (var_ref vec_ctor)  (record_ref (record_ref
>>> (var_ref s)  s2)  a) )
>>>           (assign  (xyzw) (var_ref v)  (var_ref vec_ctor) )
>>>           (call EmitVertex  ())
>>>
>>> ... endloop)
>>>
>>>
>>> But later, after linking, the Z and W components of the vector
>>> constructor seem to get merged.  Note: there's only 3 ubo_load
>>> instructions now instead of four:
>>>
>>> ...
>>>         (assign  (xyzw) (var_ref gl_Position)  (array_ref (var_ref
>>> vertex_to_gs) (constant int (0)) ) )
>>>         (declare (temporary ) vec4 vec_ctor)
>>>         (declare () float cse)
>>>         (assign  (x) (var_ref cse)  (expression float ubo_load (constant
>>> uint (0)) (constant uint (0)) ) )
>>>         (assign  (x) (var_ref vec_ctor)  (var_ref cse) )
>>>         (declare () float cse at 3)
>>>         (assign  (x) (var_ref cse at 3)  (expression float ubo_load
>>> (constant uint (0)) (constant uint (16)) ) )
>>>         (assign  (y) (var_ref vec_ctor)  (var_ref cse at 3) )
>>>         (declare () float cse at 4)
>>>         (assign  (x) (var_ref cse at 4)  (expression float ubo_load
>>> (constant uint (0)) (constant uint (32)) ) )
>>>         (assign  (z) (var_ref vec_ctor)  (var_ref cse at 4) )
>>>         (assign  (w) (var_ref vec_ctor)  (var_ref cse at 4) )
>>>         (assign  (xyzw) (var_ref v)  (var_ref vec_ctor) )
>>>         (assign  (xyzw) (var_ref v at 5)  (var_ref v) )
>>>         (assign  (xyzw) (var_ref gl_Position at 6)  (var_ref gl_Position) )
>>>         (emit-vertex (constant int (0)) )
>>>
>>> I'm not sure I can debug this much further right now.
>>
>> I think the problem is in lower_ubo_reference_visitor::handle_rvalue()
>> in lower_ubo_reference.cpp
>>
>> If I disable the "const_offset = glsl_align(const_offset,
>> max_field_align);" near line 231 things seem to work.
>>
>> Something is wrong in the logic that does struct/field alignment in that
>> function.
>
> Oof.  There are a number of bugs in this area.  I've been looking at one
> for the last few days, and I've just about got it resolved.  Basically.
>
>      uniform U {
>          layout(row_major) mat3x4 m;
>          vec4 v;
>      } u;
>
> doesn't layout u.m as row-major.  This only occurs if there is an
> instance name. :(
>
> I can dig into this bug once I'm done with the matrix bug.
>
>> Eric wrote the original code, maybe he can take a look?
>>
>> I've also been reading the GL_ARB_uniform_buffer_object spec to
>> understand the rules of layout "std140".  For a structure it says "the
>> base alignmentof the structure is <N>, where <N> is the largest base
>> alignment value of any of its members, and rounded up to the base
>> alignment of a vec4."
>>
>> So it sounds like all structs should start at 16-byte boundaries.  So
>> for the piglit test in question, I'd expect the UBO block to be 32
>> bytes.  But both Mesa and nvidia compute the UBO size to be 16 bytes.

Actually, I had a mistake there.  Mesa reports 32 while nvidia reports 
16 bytes.

>
> Hmm... do both drivers give 32 for
>
>      struct S1 { float f; };
>      struct S2 { vec4 v; };
>
>      uniform U {
>          S1 s1;
>          S2 s2;
>      };

Mesa: 32
nvidia: 32

>
> Also, the ES3.1 and GL4.4 specs have some changes to the std140 rules to
> make them more clear.  Do those specs agree with the extension spec?

I don't have time to dig into that right now.

-Brian