[Mesa-dev] [PATCH 1/2] WIP gallivm: add support for PK2H/UP2H

Sun Jan 3 17:49:42 PST 2016

On Sun, Jan 3, 2016 at 8:37 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 03.01.2016 um 22:29 schrieb Ilia Mirkin:
>> This hits assertion failures on LLVM 3.5
>>
>> Signed-off-by: Ilia Mirkin <imirkin at alum.mitedu>
>> ---
>>
>> It definitely worked at one point or another, but it might have been with
>> a later LLVM version and/or on a different CPU. On my i7-920 with LLVM 3.5
>> I definitely get assertion errors from inside LLVM. Any interested party
>> can take this patch over and fix it as they see fit. Or ignore it.
>
> Interesting. I wasn't even aware using fptrunc could work at all with
> f16 type. And on some quick look this was indeed introduced later, I
> think llvm 3.6 (some backends might still not do it today). There are
> also llvm.convert.to.fp16 (and f32) operations (probably the same
> backends won't do them neither...). I'm not really sure what rounding
> mode semantics they'll end up with. Seems like fptrunc actually might do
> round-to-nearest-even (I suppose llvm.convert.to.fp16 too), but
> depending on how llvm ends up doing it it might well be subject to the
> same no-denorm issue as the util code.
> (And unfortunately, it looks like we don't have any direct control over
> rounding mode neither for them so we can't ditch
> lp_build_float_to_smallfloat and lp_build_smallfloat_to_float.)

My (admittedly faint) recollection is that this passed the existing
piglit tests on a Haswell CPU and I guess at least LLVM 3.6 or maybe
even 3.7 (not at the machine right now). But depending on CPU
different code might be emitted of course. I wasn't aware of the
lp_build_float_to_smallfloat stuff. I don't plan on pursuing this
patch further, if you're interested, feel free to redo it.

This whole series was mostly about me _really_ hating the code that
mesa lowered the half-float pack/unpack into, not any actual
performance thing. I don't think that we're aware of a single usage of
these builtins outside of piglit. Curiously I saw that GRID Autosport
makes use of f32tof16 (and back) functions, but they're
locally-defined as

uint packfp32(in float fp32)
{
        uint result;
        uint temp = floatBitsToUint(fp32);
        result = ((temp & 0x80000000u) >> 16) | (((temp & 0x7fffffffu)
>> 13) - (0x38000000u >> 13));
        return result;
}

Which later is used as:

                                r9.y = uintBitsToFloat(uint(0x3f800000));
                                r0.x = uintBitsToFloat(uint(r9.y));
                                r0.y = uintBitsToFloat(f32tof16(r0.y));
                                r2.w =
intBitsToFloat(bfi(floatBitsToInt(r0.x), floatBitsToInt(r0.y),
int(16), int(16)));

I guess it was too much trouble to use packHalf2x16(r0.xy) [and it
*appears* that they forgot to f32tof16 r0.x... this all gets stored
off into a ssbo and presumably reused somewhere, so can't tell if it
was intended].

  -ilia