[Piglit] [PATCH] glsl-es-3.00: Generate tests for builtin packing functions (v2)

Tue Jan 22 14:23:08 PST 2013

On 01/22/2013 10:24 AM, Paul Berry wrote:
> On 21 January 2013 15:24, Chad Versace <chad.versace at linux.intel.com> wrote:
> 
>> Generate the following test files:
>>   {const,vs,fs}-{pack,unpack}{Snorm,Unorm,Half}2x16.shader_test
>>
>> The tests are generated by a new Python script,
>> gen_builtin_packing_tests.py, and placed into directory
>> spec/glsl-es-3.00/execution/built-in-functions.
>>
>> v2: Add reduced_input_table. This allows us to generate a smaller set of
>>     inputs for packHalf2x16 in order to avoid Linux's oom-killer.
>>
> 
> The change looks good, thanks.
> 
> I've included a couple of minor comments below, including some other ideas
> for speeding up performance if it's still slow, but I don't consider any of
> them blocking issues, so consider this patch:
> 
> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
> 
> 
>>
>> CC: Paul Berry <stereotype441 at gmail.com>
>> Signed-off-by: Chad Versace <chad.versace at linux.intel.com>
>> ---
>>  generated_tests/CMakeLists.txt               |    6 +-
>>  generated_tests/gen_builtin_packing_tests.py | 1230
>> ++++++++++++++++++++++++++
>>  2 files changed, 1235 insertions(+), 1 deletion(-)
>>  create mode 100644 generated_tests/gen_builtin_packing_tests.py

>> +# Test evaluation of constant pack2x16 expressions.
>> +const_pack_2x16_template = Template(dedent("""\
>> +    [require]
>> +    GL ES >= 3.0
>> +    GLSL ES >= 3.00
>> +
>> +    [vertex shader]
>> +    const vec4 red = vec4(1, 0, 0, 1);
>> +    const vec4 green = vec4(0, 1, 0, 1);
>> +
>> +    in vec4 vertex;
>> +    out vec4 vert_color;
>> +
>> +    void main()
>> +    {
>> +        ${func.result_precision} uint actual;
>> +
>> +        gl_Position = vertex;
>> +
>> +        % for io in func.inout_seq:
>> +        actual = ${func.name}(vec2(${io.input[0]}, ${io.input[1]}));
>> +
>> +        if (true
>> +            % for u in io.valid_outputs:
>>
> 
> Not a huge deal, but you might get better performance out of the shader
> compiler if you change this to "% for u in sorted(set(io.valid_outputs))".
> (This drops duplicate entries from io.valid_outputs, which happen a lot
> because frequently the different rounding/flushing options don't affect the
> result.)

Execution time is troublesome on Sandybridge. So I'll use your suggestion.

>> +            && actual != ${u}
>> +            % endfor
>> +           ) {
>> +            vert_color = red;
>> +            return;
>> +        }
>> +
>> +        % endfor
>> +
>> +        vert_color = green;
>> +    }
>>
> 
> Another possible performance improvement would be to change the logic to:
> 
> vert_color = green;
> % for io in func.inout_seq:
> actual = ...
> if (...) {
>    vert_color = red;
> }
> % endfor
> 
> In orther words, instead of using a "return" statement to shortcut the
> shader once a failure is found, set vert_color to green initially, and then
> only change it to red if a failure is found.  The reason this might improve
> performance is because on architectures with limited JMP support, the
> version with the embedded return statements has to get rewritten to a very
> deeply-nested if tree.

Good idea. I'll do that.

>> +class FuncOpts:
>> +    """Options that modify the evaluation of the GLSL pack/unpack
>> functions.
>> +
>> +    Given an input and a pack/unpack function, there exist multiple valid
>> +    outputs because the GLSL specs permit variation in the implementation
>> of
>> +    the function. The actual output is dependent on the GLSL compiler's
>> and
>> +    hardware's choice of rounding mode (for example, to even or to
>> nearest)
>> +    and handling of subnormal (also called denormalized) floating point
>> +    numbers.
>> +
>> +    This class attempts to capture such permitted variations in
>> implementation
>> +    behavior. To select a particular behavior, pass the appropriate enums
>> to
>> +    the constructor.
>> +
>> +    What follows is an explanation of the variations and how to select
>> them.
>> +
>> +    Flushing subnormal floats to zero
>> +    ---------------------------------
>> +    The GLSL ES 3.00 and GLSL 4.10 specs allows implementations to
>> truncate
>> +    subnormal floats to zero. From section 4.5.1 "Range and Precision" of
>> the
>> +    two specs:
>> +        Any subnormal (denormalized) value input into a shader or
>> +        potentially generated by any operation in a shader can be
>> +        flushed to 0.
>> +
>> +    The constructor parameter 'flush_mode' selects the flushing behavior.
>> +    Valid values are:
>> +        - FLUSH_NONE:  Flush no float to zero.
>> +        - FLUSH_FLOAT32: Flush subnormal float32 values to zero.
>>
> 
> Thinking about this some more, I have to admit some surprise that it's
> necessary to account for flushing of subnormal float32 values in this test
> generator.  It seems to me that since all subnormal float32's are
> extraordinarily tiny (-2^-126 < x < 2^-126), they will always behave
> equivalently to zero considering the coarseness of the packing functions.
> Are you aware of a counterexample?

Why didn't I see that? I'll remove the float32 flushing before committing.

>> +    else:
>> +        # f32 lies in the range [max_normal16 + max_step16, inf), which is
>> +        # outside the range of finite float16 values. The resultant
>> float16 is
>> +        # infinite.
>>
> 
> You might want to define max_step16 above.

A little mistake. I'll fix it.

>> +def make_inputs_for_pack_snorm_2x16():
>> +    # The domain of packSnorm2x16 is [-inf, +inf]^2. The function clamps
>> +    # its input into the range [-1, +1]^2.
>> +    #
>> +    # We test -0.0 in order to stress the implementation's handling of
>> zero.
>> +    # The implementation should return a uint16 that encodes -0.0; that
>> is,
>> +    # a uint16 # with the sign bit set.
>>
> 
> Is there a reasoning error in this comment?  uint16's are unsigned
> integers, so they don't have a sign bit and can't encode -0.0.
> 
> I'm all for testing -0.0 as an input, but I think the only place we need to
> be careful to make sure that a -0.0 input yields a negative result is in
> packHalf2x16 and unpackHalf2x16.
> 
> Having said that, I think the code is correct.

Right. Sometimes I'm surprised that I can write correct code with such
stupid comments.

>> +def make_inputs_for_pack_half_2x16():
>> +    # The domain of packHalf2x16 is ([-inf, +inf] + {NaN})^2. The function
>> +    # does not clamp its input.
>> +    #
>> +    # We test both -0.0 and +0.0 in order to stress the implementation's
>> +    # handling of zero.
>> +
>> +    subnormal_min = 2.0**(-14) * (1.0 / 2.0**10)
>> +    subnormal_max = 2.0**(-24) * (1023.0 / 2.0**10)
>>
> 
> Whoa, shouldn't that -24 be a -14?  Fortunately it doesn't look like
> subnormal_max is used below.

Oops.

>> +def make_inputs_for_unpack_half_2x16():
>> +    # For each of the two classes of float16 values, subnormal and
>> normalized,
>> +    # below are listed the exponent and mantissa of the class's boundary
>> +    # values some values slightly inside the bounds.
>>
> 
> Missing the word "and" here?

Will fix.