[Mesa-dev] [PATCH 2/3] gallivm: add fp64 support.

Mon Jun 29 13:18:18 PDT 2015

On 30 June 2015 at 00:58, Roland Scheidegger <sroland at vmware.com> wrote:
> Don't worry about the AoS stuff. Only meant to do simple things.
>
> Looks good overall, I guess it makes sense to not split execution too
> (so you'd have native hw vector size there), llvm should handle that
> pretty well these days (the sse intrinsics won't get used that way
> probably (though there's a helper for that too which makes it possible
> but it might not be hooked up, but I guess there's not really much need
> for them).
>
> Some comments inline.

I've noticed we have no tests for indirect access to fp64 things, so
I'll probably write some first to validate the indirect paths I
haven't fixed up yet.

>> Two things that don't mix well are SoA and doubles, see
>> emit_fetch_double, and emit_store_double_chan in this.
>>
>> I've also had to split emit_data.chan, to add src_chan,
>> which can be different for doubles.
>>
>> Open issues:
>> are intrinsics okay for floor/ceil?
> The question is if they actually work if you don't have sse4.1 and don't
> just crash (at least I assume with sse4.1 it turns into round
> instruction). (Or on non-x86 cpus if there is no direct hw support). If
> they don't you'd have to provide your own implementation (at least as a
> fallback) or make support for the extension conditional. Otherwise llvm
> intrinsics are just fine (traditionally we didn't really use them much
> as most of the things we do with sse intrinsics were missing, and even
> if some intrinsic existed it often didn't work, but that was a long time
> ago - ideally we'd switch to llvm intrinsics where possible).

Okay well I'm okay with limiting fp64 to where they work I suppose
though that needs
testing on older non sse4.1 hw.

>> +
>> +      scalar = LLVMBuildExtractElement(builder, input, si, "");
>> +      res = LLVMBuildInsertElement(builder, res, scalar, ii, "");
>> +      scalar2 = LLVMBuildExtractElement(builder, input2, si, "");
>> +      res = LLVMBuildInsertElement(builder, res, scalar2, ii1, "");
>> +   }
> Did you check what code this generated? Traditionally, we tried to avoid
> the extract/insert stuff where possible and use shuffles instead.
> Because llvm would actually do inserts/extracts (i.e. move from simd
> domain to integer domain and back, which is pretty horrendous, and
> doubly so on some non-intel cpus which have like 15+ cycles latency for
> this). It is possible though this is no longer a problem, llvm 3.6 or
> 3.7 got some majorly improved shuffle optimizer which might also catch this.

No I haven't looked at what it generated, I was pretty sure it was
going to be ugly,

Oh if I can use shufflevector for this direction I probably will, that
make sense. I'm not sure it'll work for the other way,
but maybe two shufflevectors will, I hadn't looked into it that much yet.

Dave.