[Piglit] [Mesa-dev] [PATCH] Add (un)packHalf tests which don't fail on GCN

Fri Feb 5 19:12:54 UTC 2016

On Fri, Feb 5, 2016 at 7:11 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 05.02.2016 um 17:04 schrieb Marek Olšák:
>> On Fri, Feb 5, 2016 at 4:48 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Am 05.02.2016 um 16:08 schrieb Marek Olšák:
>>>> On Fri, Feb 5, 2016 at 3:56 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>>> Am 05.02.2016 um 15:44 schrieb Marek Olšák:
>>>>>> On Fri, Feb 5, 2016 at 10:57 AM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>> On Fri, Feb 5, 2016 at 1:55 AM, Matt Turner <mattst88 at gmail.com> wrote:
>>>>>>>> On Thu, Feb 4, 2016 at 10:50 AM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>>>>>
>>>>>>>>> This is a subset of the generated tests which are known to fail
>>>>>>>>> on everything except CPU emulation (AFAIK).
>>>>>>>>> ---
>>>>>>>>
>>>>>>>> This is really awful. Committing a generated test, but with unknown
>>>>>>>> bits chopped out is gross.
>>>>>>>>
>>>>>>>> If it were me, I'd want to understand why my hardware behaved
>>>>>>>> differently -- not just hack up *different* tests and claim victory.
>>>>>>>>
>>>>>>>> FWIW, the generated tests pass on all Intel hardware exposing
>>>>>>>> ARB_shading_language_packing. Gen7+ has native half-float support, and
>>>>>>>> Gen6 uses the lowering code in lower_packing_builtins.cpp to turn the
>>>>>>>> built-ins into a pile of instructions.
>>>>>>>>
>>>>>>>> If you can identify how AMD hardware behaves differently and can prove
>>>>>>>> that the generator needs to be relaxed or something, that's cool. But
>>>>>>>> as is, I hate this patch.
>>>>>>>>
>>>>>>>> I can't find anything in the AMD docs (I looked at GCN3) about
>>>>>>>> half-precision support, so I can't check my theory that AMD hardware
>>>>>>>> rounds towards zero instead of to-nearest/even like Intel.
>>>>>>>
>>>>>>> Since the tests only fail with very small numbers, I think the problem
>>>>>>> is that denorms are disabled by radeonsi. I can try to confirm that.
>>>>>>>
>>>>>>> The hardware rounds to nearest even. The hw precision is:
>>>>>>> - unpack functions - 0 ULP
>>>>>>> - pack functions = 0.5 ULP
>>>>>>> - input and output denorms are flushed to 0
>>>>>>
>>>>>> Hey Matt,
>>>>>>
>>>>>> I have just confirmed that I was right. After I enable denormals in
>>>>>> hw, the original tests pass. This means that this patch tests the
>>>>>> packing functions but skips denormals.
>>>>>>
>>>>>> Not so awful now, is it? :)
>>>>>>
>>>>>> Sadly, I can't enable denormals on all chips, because they are slow.
>>>>>>
>>>>>> So if I add "-no-denormals" suffix into the test names, I can push this, right?
>>>>>>
>>>>>
>>>>> Can't you hack up the generator instead? By the looks of it
>>>>> (gen_builtin_packing_tests.py) it has a list of values which result in
>>>>> denorm f16 values (make_inputs_for_pack_half_2x16). Presumably you could
>>>>> add a test there which uses a different list, not including them.
>>>>>
>>>>> (That said, I'm a bit surprised for conversion to/from fp16 your hw
>>>>> doesn't do fp16 denorms - they'd be required by d3d(11) as well.)
>>>>
>>>> The hardware can do denormals, they are just slow on all chips except VI.
>>>>
>>>> GLSL 4.50 doesn't require denormals, thus piglit shouldn't even contain
>>>> tests for them.
>>>>
>>>
>>> I'm not asking about denormals for ordinary operations, just conversion
>>> to fp16 (any fp16 denorm is a fp32 normal). That would be along similar
>>> lines what d3d10 already required - forbids fp32 denorms, but for
>>> instance sampling fp16 surfaces requires you to handle the fp16 denorms
>>> without flushing to zero (that's at least what the docs say - maybe you
>>> can get away without it...). FWIW x86 half-float conversion instructions
>>> (vcvtph2ps, vcvtps2ph) work that way too - even if you have denorms
>>> disabled, that instruction will still produce fp16 denorms (and convert
>>> fp16 denorms to fp32 normals), you can configure rounding mode but
>>> there's no way to disable fp16 denorms.
>>> Albeit gcn3 supports fp16 natively (that is it has actual operations
>>> using them not just conversion), so I suppose it makes sense it would
>>> flush them on conversion too...
>>> But I think you're right glsl shouldn't require them (albeit it is
>>> pretty silent on that topic as far as I can tell - certainly doesn't
>>> require them for fp32).
>>
>> There is no point in enabling denormals for the conversion
>> instructions if all previous FP instructions flush denormals to 0 and
>> so do the following FP instructions.
> I think you didn't get the point: any fp16 denormal is a fp32 normal
> (and similarly, any fp32 denormal is fp16 zero). And glsl (well the
> non-es variants anyway) don't deal with anything but fp32 really, you
> cannot manipulate these packed fp16 values any other way than unpacking
> to fp32 (well you can of course, but not really in a "float way").
> Thus, even if you have no denormals for fp32, fp16 denorms are still
> useful for packing, unpacking.

OK. I get your point now.

Marek