[Mesa-dev] Improving precision of mod(x,y)

Thu Jan 15 12:26:27 PST 2015

On Thu, 15 Jan 2015 15:32:59 +0100, Roland Scheidegger  
<sroland at vmware.com> wrote:

> Am 15.01.2015 um 10:05 schrieb Iago Toral:
>> Hi,
>>
>> We have 16 deqp tests that fail, at least on i965, because of
>> insufficient precision of the mod GLSL function.
>>
>> Mesa lowers mod(x,y) to y * fract(x,y) so there can be some precision
>> lost due to fract operation. Since the result is multiplied by y the
>> total precision lost usually grows together with the value of y.
> Did you mean fract(x/y) here?
>
>>
>> Below are some examples to give an idea of the magnitude of this error.
>> The values on the right represent the precision error for each case:
>>
>> mod(-1.951171875, 1.9980468750) =>  0.0000000447
>> mod(121.57, 13.29)              =>  0.0000023842
>> mod(3769.12, 321.99)            =>  0.0000762939
>> mod(3769.12, 1321.99)           =>  0.0001220703
>> mod(-987654.125, 123456.984375) =>  0.0160663128
>> mod( 987654.125, 123456.984375) =>  0.0312500000
>>
>> As you see, for large enough values, the precision error becomes
>> significant.
>>
>> This can be fixed by lowering mod(x,y) to x - y * floor(x/y) instead,
>> which is the suggested implementation in the GLSL docs. I have a local
>> patch in my tree that does this and it does indeed fix the problem. the
>> down side is that this implementation adds and extra ADD instruction to
>> the generated code (besides replacing fract with floor, which I guess
>> have similar cost).
>>
>> Since this is a case where there is some trade-off to the fix, I wonder
>> if we are interested in doing this or not. Is the precision fix worth
>> the additional ADD?
>>
>
> Well I can tell you that llvmpipe implements frc(x) as x - floor(x), so
> this change looks good to me :-).
> On a more serious note though, it looks to me like the cost of this
> expression would be mostly dominated by the division, hence some add
> more shouldn't be that bad. And if the test is legit, I don't think
> there's much choice (unless you could make this optional for some old
> glsl versions if they didn't require that much precision but even then
> it's probably not worth bothering imho).
>

FWIW, I just typed out the following little piglit test and tried it on  
R600:

[require]
GLSL >= 3.30

[vertex shader passthrough]
[fragment shader]
uniform float a;
uniform float b;
out vec4 colour;

void
main(void)
{
//	colour = vec4(b * fract(a / b)); // current lowering of mod(x,y)
     colour = vec4(a - b * floor(a/b)); // proposed lowering
}

[test]
clear color 0.5 0.5 0.5 0.5
clear

uniform float a 4.2
uniform float b 3.5
draw rect -1 -1 2 2
probe rgba 1 1 0.7 0.7 0.7 0.7

Resulting R600 assembly:

// y * fract(x,y)
// KC0[0].x is x and KC0[1] is y
1      t: RECIP_IEEE         T0.x,  KC0[1].x
2      x: MUL                T0.x,  KC0[0].x, T0.x
3      x: FRACT              T0.x,  T0.x
4      x: MUL                R0.x,  KC0[1].x, T0.x
EXPORT_DONE        PIXEL 0     R0.xxxx  EOP

// x - y * floor(x/y)
1      t: RECIP_IEEE         T0.x,  KC0[1].x
2      x: MUL                T0.x,  KC0[0].x, T0.x
3      x: FLOOR              T0.x,  T0.x
4      x: MULADD             R0.x,  KC0[1].x, -T0.x, KC0[0].x
EXPORT_DONE        PIXEL 0     R0.xxxx  EOP

Same number of cycles/length of dependency chain/ALU pipe usage for both  
methods.

I'd expect most architectures that can do source negate with multiply-add  
in a single operation should get similar results with no extra cost for  
the subtraction.

/Glenn