[Mesa-dev] [PATCH 1/2] r600g: report correct flow control depth, taking hardware bugs into account

Vadim Girlin vadimgirlin at gmail.com
Fri Feb 1 07:34:33 PST 2013


On 02/01/2013 05:47 PM, Marek Olšák wrote:
> On Fri, Feb 1, 2013 at 6:12 AM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
>> On 02/01/2013 03:20 AM, Marek Olšák wrote:
>>>
>>> ---
>>>    src/gallium/drivers/r600/r600_pipe.c |    9 +++++++--
>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/r600/r600_pipe.c
>>> b/src/gallium/drivers/r600/r600_pipe.c
>>> index a59578d..1698cb3 100644
>>> --- a/src/gallium/drivers/r600/r600_pipe.c
>>> +++ b/src/gallium/drivers/r600/r600_pipe.c
>>> @@ -640,6 +640,8 @@ static float r600_get_paramf(struct pipe_screen*
>>> pscreen,
>>>
>>>    static int r600_get_shader_param(struct pipe_screen* pscreen, unsigned
>>> shader, enum pipe_shader_cap param)
>>>    {
>>> +       struct r600_screen *rscreen = (struct r600_screen *)pscreen;
>>> +
>>>          switch(shader)
>>>          {
>>>          case PIPE_SHADER_FRAGMENT:
>>> @@ -654,7 +656,6 @@ static int r600_get_shader_param(struct pipe_screen*
>>> pscreen, unsigned shader, e
>>>                  return 0;
>>>          }
>>>
>>> -       /* XXX: all these should be fixed, since r600 surely supports much
>>> more! */
>>>          switch (param) {
>>>          case PIPE_SHADER_CAP_MAX_INSTRUCTIONS:
>>>          case PIPE_SHADER_CAP_MAX_ALU_INSTRUCTIONS:
>>> @@ -662,7 +663,11 @@ static int r600_get_shader_param(struct pipe_screen*
>>> pscreen, unsigned shader, e
>>>          case PIPE_SHADER_CAP_MAX_TEX_INDIRECTIONS:
>>>                  return 16384;
>>>          case PIPE_SHADER_CAP_MAX_CONTROL_FLOW_DEPTH:
>>> -               return 8; /* XXX */
>>> +               /* There is a bug on certain Evergreen cards which limits
>>> +                * the control flow depth. */
>>> +               return rscreen->chip_class == EVERGREEN &&
>>> +                      rscreen->family != CHIP_CYPRESS &&
>>> +                      rscreen->family != CHIP_HEMLOCK ? 3 : 32;
>>
>>
>> I guess we can use more strict condition regarding the affected chips if the
>> bug description in the evergreen isa pdf is correct ("Chapter 4. ALU
>> Clauses"):
>>
>>> NOTE: For the 54xx and 55xx AMD GPU series only, the CF_INST_ALU*
>>> instructions do not save the active mask correctly. The branching can be
>>> wrong,
>>> possibly producing incorrect results and infinite loops. The three
>>> possible work-
>>> arounds are:
>>>
>>> a. Avoid using the CF_ALU_PUSH_BEFORE, CF_ALU_ELSE_AFTER
>>> CF_ALU_BREAK, and CF_ALU_CONTINUE instructions.
>>> b. Do not use the CF_INST_ALU* instructions when your stack depth
>>> exceeds three elements (not entries); for the 54XX series AMD GPUs,
>>> do not exceed a stack size of seven, since this GPU series has a vector
>>> size 32.
>>> c. Do not use these instructions when your non-zero stack depth mod 4 is
>>> 0 (or mod 8 is 0, for vector size 32).
>>
>>
>> E.g. it seems juniper isn't affected according to the doc.
>
> I was talking with Tom about it and he thinks Juniper is affected as well.
>
>>
>> Also I'm not sure how the meaning of the "max control flow depth" in gallium
>> maps to the hw's meaning - I suspect each loop is counted as a single level,
>> but hw uses 4 or 8 stack elements (subentries) per loop (EG ISA PDF, Section
>> "3.6.5 Stack Allocation"). So it seems we can't use ALU_xxx instructions
>> even in a single top-level loop on affected chips, if I understand it right.
>
> There are 3 variables affecting control flow:
> - Maximum if depth, used for if flattening. The maximum control flow
> depth CAP maps exactly to this variable.
> - Maximum number of allowed iterations for a loop to be unrolled. The
> catch is the loop analysis in the GLSL compiler must successfully
> determine the number of iterations. Then, if the number of iterations
> of a loop is less or equal to this variable, the loop will be
> unrolled. We can't control this variable from the driver yet, but the
> state tracker can. The default is 255.
> - There is also an upper limit on the size of the loop body, which
> also prevents loop unrolling. This is sort of hardcoded in the
> lowering pass.
>
> BTW, I think I have never had an issue with Redwood. I guess we don't
> have a piglit test using enough loops.

I never seen any related issues with my Juniper as well. I was also 
wondering why there are no bug reports for that despite that we don't 
have any workarounds.

I think we can ask proprietary compiler (shader analyzer). I tried the 
following shader:

uniform int a,b,c,d;
uniform float k;
void main() {
	float f = 0.0;
	if (a > 1) {
		if (b > 1) {
			if (c > 1) {
				if (d > 1) {
					f = k;
				}
			}
		}
	}
	gl_FragColor = vec4(f);
}

For Cedar (5450) it produces the following code for the inner if:

...
> 06 PUSH ADDR(11) POP_CNT(3)
> 07 ALU: ADDR(36) CNT(1) KCACHE0(CB0:0-15)
>       4  x: PREDGT_INT  ____,  KC0[3].x,  1      UPDATE_EXEC_MASK UPDATE_PRED
> 08 JUMP  POP_CNT(4) ADDR(11) VALID_PIX
...

The first two instructions are PUSH and ALU, instead of ALU_PUSH_BEFORE. 
The same code is produced for Redwood (5670), Caicos (6450), Turks 
(6670), Barts (6870). I think it's exactly the workaround for that bug.

On Juniper (5770) and Cypress(5870) it uses ALU_PUSH_BEFORE.

I've tried to create shader_test based on this shader (see attached 
file), it produces four nested ALU_PUSH_BEFORE with r600g and passes 
with my juniper, though I'm not sure if it can really detect the fail.

Vadim

>
> Marek
>

-------------- next part --------------
[require]
GLSL >= 1.20

[vertex shader]
void main()
{
	gl_Position = gl_Vertex;
}

[fragment shader]
uniform int a,b,c,d;
uniform float k;
void main() {
	float f = 0.0;
	if (a > 1) {
		if (b > 1) {
			if (c > 1) {
				if (d > 1) {
					f = 1.0;
				}
			}
		}
	}
	gl_FragColor = vec4(1.0 - f, f, 0.0, 1.0);
}

[test]
uniform int a 2
uniform int b 2
uniform int c 2
uniform int d 2
clear color 0.0 0.0 0.0 0.0
clear
draw rect -1 -1 2 2
probe all rgba 0.0 1.0 0.0 1.0


More information about the mesa-dev mailing list