[Mesa-dev] [PATCH 2/3] radeonsi: implement PK2H and UP2H opcodes

Wed Feb 3 18:27:14 UTC 2016

On Wed, Feb 3, 2016 at 6:29 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 03.02.2016 um 18:01 schrieb Marek Olšák:
>> On Wed, Feb 3, 2016 at 5:37 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Am 03.02.2016 um 10:38 schrieb Michel Dänzer:
>>>> On 03.02.2016 18:29, Marek Olšák wrote:
>>>>> On Wed, Feb 3, 2016 at 10:19 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>>>> On 03.02.2016 05:15, Marek Olšák wrote:
>>>>>>> On Sat, Jan 30, 2016 at 12:46 AM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>>>>
>>>>>>>> Based on a gallivm patch by Ilia Mirkin.
>>>>>>>>
>>>>>>>> +8 piglit regressions due to precision issues
>>>>>>
>>>>>> You're saying this patch causes 8 piglit tests to fail? What are the
>>>>>> benefits we get in exchange for that?
>>>>>
>>>>> The tests are too strict and llvmpipe allegedly fails them too.
>>>>
>>>> Allegedly? You can easily test that. :)
>>> That's not so easy. I'm not even entirely sure they are really too strict.
>>> The glsl wording leaves something to be desired, with things such as
>>> "rounding mode is undefined" but yet it requires at least some
>>> operations to be "correctly rounded".
>>> FWIW the arb_shader_packing tests require either round-to-nearest-even
>>> or round-to-nearest-trunc (both with rounding not representable finite
>>> values to infinity), whereas llvmpipe does just trunc (which comes with
>>> round-to-max-finite). (There's also the question about fp16 denorms -
>>> llvmpipe will flush them to zero for pack, but handle them on unpack,
>>> again glsl doesn't really say anything about that...). However, I wasn't
>>> brave enough to actually enable it for llvmpipe at least for now...
>>
>> A simple test that checks if the results are within reasonable bounds
>> should be enough in my opinion. We can't change the behavior of the
>> hardware instructions anyway.
>>
>> The current radeonsi setting for fp16, fp32, and fp64 is:
>> - round to nearest even
>> - flush input and output denorms
>>
>> (FLOAT_MODE in PGM registers, described in the GCN3 ISA document:
>> section 6.4, table 6.4, fp16 uses the fp64 setting IIRC)
>>
>> Marek
>>
>
> Are you sure though the cvt f16 works according to that? Pre-GCN3 you
> didn't have any fp16 instructions (except the conversion one).
> Albeit gcn actually had a separate  V_CVT_PKRTZ_F16_F32 intrinsic -
> which says in the name it is round to zero so I suppose the other one
> indeed honors current rounding mode. But if it is using round to nearest
> even, it should work for the piglit test (I think it was supposed to
> tolerate flush to zero).

I'm not totally sure about which float_mode flags affect fp16
conversion instructions, but it's not important at the moment.

The hardware implementation has precision of 0.5 ULP for fp32->fp16,
so it should be perfect. It also supports denorms.

Marek