[Piglit] [PATCH v3 2/3] arb_shader_precision: add framework for calculating tolerances for complex functions

Mon Mar 2 14:14:51 PST 2015

On 02/28/2015 12:18 PM, Ilia Mirkin wrote:
> BTW, and this is a bigger concern... your approach appears to
> basically be along the lines of
>
> 1. Perform calculation with ~infinite precision
> 2. Perform calculation with ... "finite" precision
> 3. Compare. If diff is too big, don't bother, otherwise keep summing
> ULP's based on that diff (rather than the allowable error).

Yeah.  Well, almost.  The "finite" precision of Part 2 is the very 
precision at which this calculation will be done on the GPU, and Part 3 
is just "Compare, then reject or use", period.

>
> However this doesn't handle compounding that well. If I have an op
> that can be off by 1000 ULPs, the computation with low precision won't
> capture that. Let's say you define an op like 'fma' as a complex op,
> and the mul part of it can be off by 10 ULPs but the 'add' part of it
> can't be off by any ULPs. Will the error computed by your "complex"
> path provide a tolerance of 10 ULPs in that case?

In my approach, the error discovered by comparing the ~infinite 
precision result to the finite precision result *is* the allowable error 
range.  It's like saying "This function diverged from the true result by 
amount x when simulated at finite precision on the CPU, so we need to 
give it x amount of leeway when it is run on the GPU."  The finite 
precision result should be a complete representation of how error truly 
propagated through the intermediate stages of the equation, for the 
given inputs.  You are correct that my method does not take the 
tolerance values listed in the EXT for simple operations and munge them 
in some way to arrive at a final allowable tolerance for complex 
operations.  It was my assessment that the language in the EXT was too 
vague to clearly delineate a technique for doing so, but I can see how 
some might argue for their inclusion in the complex tolerances in some 
form.  I did review/attempt multiple techniques for doing so, but I 
considered the technique presented in these patches to be the most 
straightforward and maintainable.  It has become apparent to me now that 
it has instead only served to confuse the one person willing to help me 
get this feature pushed!

>
> I think that the better approach (albeit somewhat complex) would be to
> define a class like "ImpreciseValue" which would keep both the "low"
> and "high" values allowed (as np.float32/float64's). Then you can
> define all operations to compute all 4 possible values, i.e. if it's
> multiplication, do
>
> a.low * b.low
> a.high * b.low
> a.low * b.high
> a.high * b.high
>
> And take the min/max of those, and if the multiplication operation has
> a tolerance of, say, 10 ULP's, spread the min/max apart by that much.
> Each operation would return one of these ImpreciseValue things, and
> you should be able to determine an accurate tolerance. (I think the
> low/high approach works for all the functions we have here... haven't
> proven it out, but seems like it could be fine, esp over small ranges,
> since in practice the highest error is with pow and that's only 16
> ULPs and it's never compounded.) You could even add syntactic sugar
> and override things like __add__ and so on so that you can keep using
> +, -, etc.
>
> Thoughts?

So what you're describing here is an Interval Arithmetic-like approach, 
where upper and lower bounds are determined by trial.  I agree that an 
approach like this makes more sense to someone expecting to see the 
tolerances for simple operations employed somehow in the calculation of 
tolerances for complex functions.  I had originally discounted this 
technique, however, due to the possibility that it might create 
uselessly large error ranges in certain cases (IA's dependency problem).

I hope you can understand that I am a bit wary of expending too much 
more time on what has already become a disproportionate effort for a 
sub-point of this extension that appears to be very much open to 
interpretation.  I do appreciate the concerns for understandable and 
maintainable code that have arisen here, though, and I would be willing 
to make an attempt to change horses midstream if you felt it was 
absolutely necessary.

>
>    -ilia
>
>
> On Fri, Feb 27, 2015 at 6:02 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Fri, Feb 27, 2015 at 5:36 PM, Micah Fedke
>> <micah.fedke at collabora.co.uk> wrote:
>>> +def _gen_tolerance(name, rettype, args):
>>> +    """Return the tolerance that should be allowed for a function for the
>>> +    test vector passed in.  Return -1 for any vectors that would push the
>>> +    tolerance outside of acceptable bounds
>>
>> It seems to alternatively return scalars or lists. Why?
>>
>>> +    """
>>> +    if name in simple_fns:
>>> +        return simple_fns[name]
>>> +    elif name in complex_fns:
>>> +        if name in componentwise:
>>> +            # for componentwise functions, call the reference function
>>> +            # for each component (scalar components multiplex here)
>>> +            # eg. for fn(float a, vec2 b, vec2 c) call:
>>> +            # _fn_ref(a, b[0], c[0])
>>> +            # _fn_ref(a, b[1], c[1])
>>> +            # and then capture the results in a list
>>> +            errors = []
>>> +            badlands = []
>>> +            component_tolerances = []
>>> +            for component in range(rettype.num_cols * rettype.num_rows):
>>> +                current_args = []
>>> +                for arg in args:
>>> +                    # sanitize args so that all analyze operations can be performed on a list
>>> +                    sanitized_arg = _listify(arg)
>>> +                    current_args.append(sanitized_arg[component % len(sanitized_arg)])
>>
>> This is confusing. Does this ever happen on matrix "outputs"? When
>> would component be > len(sanitized_arg)? And when it is, why does
>> modding it by the length again make sense?
>>
>>> +                cur_errors, cur_badlands, cur_component_tolerances = _analyze_ref_fn(complex_fns[name], current_args)
>>> +                errors.extend(cur_errors)
>>> +                badlands.extend(cur_badlands)
>>> +                component_tolerances.extend(cur_component_tolerances)
>>> +        else:
>>> +            # for non-componentwise functions, all args must be passed in in a single run
>>> +            # sanitize args so that all analyze operations can be performed on a list
>>
>> Why did you choose to split these up? Why not just make the various
>> functions just always take a list and operate on the whole list? It
>> seems like that would save a lot of heartache and confusion...
>>
>>> +            current_args = map(_listify, args)
>>> +            errors, badlands, component_tolerances = _analyze_ref_fn(complex_fns[name], current_args)
>>> +        # results are in a list, and may have one or more members
>>> +        return -1.0 if True in badlands else map(float, component_tolerances)
>>
>> If badlands is just a true/false (as it seems to be), makes sense to
>> not keep it as a list and just have it as a plain bool right?
>>
>> Also the
>>
>> a if b else c
>>
>> syntax is generally meant for "a is the super-common case, but in
>> these awkward situations, it may be c". Or "I _really_ _really_ want a
>> single expression here". It doesn't seem like it'd reduce readability
>> to just do the more common
>>
>> if True in badlands:
>>    return -1
>> else:
>>    return map(float, component_tolerances)
>>
>> Which also makes it more obvious that something funky is going on
>> since one path returns a scalar while the other returns a list.
>>
>>> +    elif name in mult_fns:
>>> +        x_type = glsl_type_of(args[0])
>>> +        y_type = glsl_type_of(args[1])
>>> +        # if matrix types are involved, the multiplication will be matrix multiplication
>>> +        # and we will have to pass the args through a reference function
>>> +        if x_type.is_vector and y_type.is_matrix:
>>> +            mult_func = _vec_times_mat_ref
>>> +        elif x_type.is_matrix and y_type.is_vector:
>>> +            mult_func = _mat_times_vec_ref
>>> +        elif x_type.is_matrix and y_type.is_matrix:
>>> +            mult_func = _mat_times_mat_ref
>>> +        else:
>>> +            # float*float, float*vec, vec*float or vec*vec are all done as
>>> +            # normal multiplication and share a single tolerance
>>> +            return mult_fns[name]
>>> +        # sanitize args so that all analyze operations can be performed on a list
>>> +        errors, badlands, component_tolerances = _analyze_ref_fn(mult_func, _listify(args))
>>> +        # results are in a list, and may have one or more members
>>> +        return -1.0 if True in badlands else map(float, component_tolerances)
>>> +    else:
>>> +        # default for any operations without specified tolerances
>>> +        return 0.0
>>
>> We've had a few rounds of these reviews, and I'm still generally
>> confused by the specifics of how this is done. I normally don't have
>> this much trouble reading code... not sure what's going on. Feel free
>> to find someone else to review if you're getting frustrated by my
>> apparently irreparable state of confusion.
>>
>> Cheers,
>>
>>    -ilia

-- 

Micah Fedke
Collabora Ltd.
+44 1223 362967
https://www.collabora.com/
https://twitter.com/collaboraltd