[Mesa-dev] [PATCH] gallivm: add fp64 support. (v2)
Dave Airlie
airlied at gmail.com
Tue Jun 30 19:48:11 PDT 2015
>> LLVMValueRef base_ptr,
>> LLVMValueRef indexes,
>> - LLVMValueRef overflow_mask)
>> + LLVMValueRef overflow_mask, LLVMValueRef indexes2)
>> {
>> struct gallivm_state *gallivm = bld_base->base.gallivm;
>> LLVMBuilderRef builder = gallivm->builder;
>> struct lp_build_context *uint_bld = &bld_base->uint_bld;
>> struct lp_build_context *bld = &bld_base->base;
>> - LLVMValueRef res = bld->undef;
>> + LLVMValueRef res;
>> unsigned i;
>>
>> + if (indexes2)
>> + res = LLVMGetUndef(LLVMVectorType(LLVMFloatTypeInContext(gallivm->context), bld_base->base.type.length * 2));
>> + else
>> + res = bld->undef;
>> /*
>> * overflow_mask is a vector telling us which channels
>> * in the vector overflowed. We use the overflow behavior for
>> @@ -976,26 +980,47 @@ build_gather(struct lp_build_tgsi_context *bld_base,
>> * control flow.
>> */
>> indexes = lp_build_select(uint_bld, overflow_mask, uint_bld->zero, indexes);
>> + if (indexes2)
>> + indexes2 = lp_build_select(uint_bld, overflow_mask, uint_bld->zero, indexes2);
>> }
>>
>> /*
>> * Loop over elements of index_vec, load scalar value, insert it into 'res'.
>> */
>> - for (i = 0; i < bld->type.length; i++) {
>> - LLVMValueRef ii = lp_build_const_int32(bld->gallivm, i);
>> - LLVMValueRef index = LLVMBuildExtractElement(builder,
>> - indexes, ii, "");
>> + for (i = 0; i < bld->type.length * (indexes2 ? 2 : 1); i++) {
>> + LLVMValueRef si, di;
>> + LLVMValueRef index;
>> LLVMValueRef scalar_ptr, scalar;
>>
>> + if (indexes2) {
>> + si = lp_build_const_int32(bld->gallivm, i >> 1);
>> + di = lp_build_const_int32(bld->gallivm, i);
>> + } else {
>> + si = lp_build_const_int32(bld->gallivm, i);
>> + di = si;
>> + }
>> +
>> + if (indexes2 && (i & 1)) {
>> + index = LLVMBuildExtractElement(builder,
>> + indexes2, si, "");
>> + } else {
>> + index = LLVMBuildExtractElement(builder,
>> + indexes, si, "");
>> + }
>> scalar_ptr = LLVMBuildGEP(builder, base_ptr,
>> &index, 1, "gather_ptr");
>> scalar = LLVMBuildLoad(builder, scalar_ptr, "");
>>
>> - res = LLVMBuildInsertElement(builder, res, scalar, ii, "");
>> + res = LLVMBuildInsertElement(builder, res, scalar, di, "");
>> }
>>
>> if (overflow_mask) {
>> - res = lp_build_select(bld, overflow_mask, bld->zero, res);
>> + if (indexes2) {
>> + res = LLVMBuildBitCast(builder, res, bld_base->dbl_bld.vec_type, "");
>> + overflow_mask = LLVMBuildSExt(builder, overflow_mask, bld_base->dbl_bld.int_vec_type, "");
>> + res = lp_build_select(&bld_base->dbl_bld, overflow_mask, bld_base->dbl_bld.zero, res);
>> + } else
>> + res = lp_build_select(bld, overflow_mask, bld->zero, res);
>> }
> This function looks pretty complex to me.
> I wonder if it wouldn't make more sense to use the gather as it was and
> just call it twice, with some shuffle for the fetched values afterwards.
> (There is actually some good reason why build_gather should be a
> "simple" function extracting elements, do loads, and insert the loaded
> values in a straightforward manner, this is supposed to be able to turn
> into a avx2 gather at some point, and everything doing something
> different would need to be thrown out.)
It still collapses back to the previous version if indexes2 is NULL,
so if someone does
create a version that is avx2 gather then it should use that path for
the single width vars,
and can use the slower uglier one for fp64s I would think. I don't
think we'll ever get fp64
gathers/stores to be pretty.
>> + LLVMValueRef temp, temp2;
>> + LLVMValueRef shuffles[8];
>> + LLVMValueRef shuffles2[8];
>> +
>> + for (i = 0; i < bld_base->base.type.length; i++) {
>> + shuffles[i] = lp_build_const_int32(gallivm, i * 2);
>> + shuffles2[i] = lp_build_const_int32(gallivm, (i * 2) + 1);
>> + }
>> +
>> + temp = LLVMBuildShuffleVector(builder, value, LLVMGetUndef(LLVMTypeOf(value)), LLVMConstVector(shuffles, bld_base->base.type.length), "");
>> + temp2 = LLVMBuildShuffleVector(builder, value, LLVMGetUndef(LLVMTypeOf(value)), LLVMConstVector(shuffles2, bld_base->base.type.length), "");
> These lines are a bit long...
Indeed, cleaned that up.
>
>
>> + lp_exec_mask_store(&bld->exec_mask, float_bld, pred, temp, chan_ptr);
>> + lp_exec_mask_store(&bld->exec_mask, float_bld, pred, temp2, chan_ptr2);
>> + } else {
>> + lp_exec_mask_store(&bld->exec_mask, float_bld, pred, value, chan_ptr);
> Actually I think it would be nicer to keep the ordinary float path out
> of this function (the name also implies storing doubles) and handle that
> one-liner in the caller(s).
> Might also make sense to actually get the chan_ptrs in this function
> itself, it sort of feels unnatural to pass 2 of them around (as they are
> obviously closely related), but no biggie.
Yes I agree, I dropped this out to the callers.
> Looks good to me. I'm not entirely happy with the build_gather, but it
> is fixable later, your choice.
Yeah I think I'll leave it as is until we get some time to work on
making it avx2 compatible.
>
> Reviewed-by: Roland Scheidegger <sroland at vmware.com>
Thanks,
Dave.
More information about the mesa-dev
mailing list