[Mesa-dev] [Piglit] [PATCH] Add (un)packHalf tests which don't fail on GCN

Fri Feb 5 16:04:33 UTC 2016

On Fri, Feb 5, 2016 at 4:48 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 05.02.2016 um 16:08 schrieb Marek Olšák:
>> On Fri, Feb 5, 2016 at 3:56 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Am 05.02.2016 um 15:44 schrieb Marek Olšák:
>>>> On Fri, Feb 5, 2016 at 10:57 AM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>> On Fri, Feb 5, 2016 at 1:55 AM, Matt Turner <mattst88 at gmail.com> wrote:
>>>>>> On Thu, Feb 4, 2016 at 10:50 AM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>>>
>>>>>>> This is a subset of the generated tests which are known to fail
>>>>>>> on everything except CPU emulation (AFAIK).
>>>>>>> ---
>>>>>>
>>>>>> This is really awful. Committing a generated test, but with unknown
>>>>>> bits chopped out is gross.
>>>>>>
>>>>>> If it were me, I'd want to understand why my hardware behaved
>>>>>> differently -- not just hack up *different* tests and claim victory.
>>>>>>
>>>>>> FWIW, the generated tests pass on all Intel hardware exposing
>>>>>> ARB_shading_language_packing. Gen7+ has native half-float support, and
>>>>>> Gen6 uses the lowering code in lower_packing_builtins.cpp to turn the
>>>>>> built-ins into a pile of instructions.
>>>>>>
>>>>>> If you can identify how AMD hardware behaves differently and can prove
>>>>>> that the generator needs to be relaxed or something, that's cool. But
>>>>>> as is, I hate this patch.
>>>>>>
>>>>>> I can't find anything in the AMD docs (I looked at GCN3) about
>>>>>> half-precision support, so I can't check my theory that AMD hardware
>>>>>> rounds towards zero instead of to-nearest/even like Intel.
>>>>>
>>>>> Since the tests only fail with very small numbers, I think the problem
>>>>> is that denorms are disabled by radeonsi. I can try to confirm that.
>>>>>
>>>>> The hardware rounds to nearest even. The hw precision is:
>>>>> - unpack functions - 0 ULP
>>>>> - pack functions = 0.5 ULP
>>>>> - input and output denorms are flushed to 0
>>>>
>>>> Hey Matt,
>>>>
>>>> I have just confirmed that I was right. After I enable denormals in
>>>> hw, the original tests pass. This means that this patch tests the
>>>> packing functions but skips denormals.
>>>>
>>>> Not so awful now, is it? :)
>>>>
>>>> Sadly, I can't enable denormals on all chips, because they are slow.
>>>>
>>>> So if I add "-no-denormals" suffix into the test names, I can push this, right?
>>>>
>>>
>>> Can't you hack up the generator instead? By the looks of it
>>> (gen_builtin_packing_tests.py) it has a list of values which result in
>>> denorm f16 values (make_inputs_for_pack_half_2x16). Presumably you could
>>> add a test there which uses a different list, not including them.
>>>
>>> (That said, I'm a bit surprised for conversion to/from fp16 your hw
>>> doesn't do fp16 denorms - they'd be required by d3d(11) as well.)
>>
>> The hardware can do denormals, they are just slow on all chips except VI.
>>
>> GLSL 4.50 doesn't require denormals, thus piglit shouldn't even contain
>> tests for them.
>>
>
> I'm not asking about denormals for ordinary operations, just conversion
> to fp16 (any fp16 denorm is a fp32 normal). That would be along similar
> lines what d3d10 already required - forbids fp32 denorms, but for
> instance sampling fp16 surfaces requires you to handle the fp16 denorms
> without flushing to zero (that's at least what the docs say - maybe you
> can get away without it...). FWIW x86 half-float conversion instructions
> (vcvtph2ps, vcvtps2ph) work that way too - even if you have denorms
> disabled, that instruction will still produce fp16 denorms (and convert
> fp16 denorms to fp32 normals), you can configure rounding mode but
> there's no way to disable fp16 denorms.
> Albeit gcn3 supports fp16 natively (that is it has actual operations
> using them not just conversion), so I suppose it makes sense it would
> flush them on conversion too...
> But I think you're right glsl shouldn't require them (albeit it is
> pretty silent on that topic as far as I can tell - certainly doesn't
> require them for fp32).

There is no point in enabling denormals for the conversion
instructions if all previous FP instructions flush denormals to 0 and
so do the following FP instructions. If the conversion instructions
were the only FP instructions manipulating with the source data in the
shader, it would make sense, but this is a very rare use case of GLSL
intrinsics that, to my knowledge, aren't used by any app. The least
important topics seem to get the most attention, apparently.

I think the right course of action is to nuke all denormal piglit
tests from orbit.

Marek