[Piglit] [PATCH v3 2/3] arb_shader_precision: add framework for calculating tolerances for complex functions

Fri Mar 6 14:01:50 PST 2015

On Fri, Mar 6, 2015 at 4:50 PM, Micah Fedke <micah.fedke at collabora.co.uk> wrote:
> So use the "max/min of all permutations" method for all ops?  e.g.:
>
>     def __mul__(self, other):
>         a = self.high * other.high
>         b = self.high * other.low
>         c = self.low * other.high
>         d = self.low * other.low
>         high = numpy.float32(numpy.amax([a, b, c, d]))
>         low = numpy.float32(numpy.amin([a, b, c, d]))
>         return ValueInterval(high, low)
>
> And tack on the tolerance at the end like this, for ops that have a
> tolerance?  Things should move in the right direction after high and low
> have been determined, if I'm not mistaken.
>
>     def __truediv__(self, other):
>         tol = numpy.float32(2.5)
>         a = self.high / other.high
>         b = self.high / other.low
>         c = self.low / other.high
>         d = self.low / other.low
>         self.high = numpy.float32(numpy.amax([a, b, c, d]))
>         self.low = numpy.float32(numpy.amin([a, b, c, d]))
>         self.high += _ulpsize(self.high) * tol
>         self.low -= _ulpsize(self.low) * tol
>         return self

Yes, I think that's right.

>
> As for manual fma's, that should work.  I wonder, though - a double-round
> manual fma has the potential to produce more error than a single-round, and
> the spec allows either method, so don't we want to evaluate the more
> error-ful option?

Yes and no. Both a * b + c and fma(a, b, c) have exact right answers
as defined by the spec. However for a particular a * b + c that
happens, the implementation is allowed to use either one. You could
define it as a range, but... how do you detect the a * b + c case?
Let's say I'm doing dot(x, x), which becomes

a * a + b * b + c * c + d * d.

An implementation is perfectly within its right to rewrite this as

fma(a, a, fma(b, b, fma(c, c, d * d)))

or even

fma(a, a, b * b + fma(c, c, d * d))

Since this approach only considers one op at a time, I don't see an
easy way to handle it, unfortunately...