[Mesa-dev] [PATCH] gallivm: add fp64 support. (v2)

Tue Jun 30 19:48:11 PDT 2015

>>               LLVMValueRef base_ptr,
>>               LLVMValueRef indexes,
>> -             LLVMValueRef overflow_mask)
>> +             LLVMValueRef overflow_mask, LLVMValueRef indexes2)
>>  {
>>     struct gallivm_state *gallivm = bld_base->base.gallivm;
>>     LLVMBuilderRef builder = gallivm->builder;
>>     struct lp_build_context *uint_bld = &bld_base->uint_bld;
>>     struct lp_build_context *bld = &bld_base->base;
>> -   LLVMValueRef res = bld->undef;
>> +   LLVMValueRef res;
>>     unsigned i;
>>
>> +   if (indexes2)
>> +      res = LLVMGetUndef(LLVMVectorType(LLVMFloatTypeInContext(gallivm->context), bld_base->base.type.length * 2));
>> +   else
>> +      res = bld->undef;
>>     /*
>>      * overflow_mask is a vector telling us which channels
>>      * in the vector overflowed. We use the overflow behavior for
>> @@ -976,26 +980,47 @@ build_gather(struct lp_build_tgsi_context *bld_base,
>>         * control flow.
>>         */
>>        indexes = lp_build_select(uint_bld, overflow_mask, uint_bld->zero, indexes);
>> +      if (indexes2)
>> +         indexes2 = lp_build_select(uint_bld, overflow_mask, uint_bld->zero, indexes2);
>>     }
>>
>>     /*
>>      * Loop over elements of index_vec, load scalar value, insert it into 'res'.
>>      */
>> -   for (i = 0; i < bld->type.length; i++) {
>> -      LLVMValueRef ii = lp_build_const_int32(bld->gallivm, i);
>> -      LLVMValueRef index = LLVMBuildExtractElement(builder,
>> -                                                   indexes, ii, "");
>> +   for (i = 0; i < bld->type.length * (indexes2 ? 2 : 1); i++) {
>> +      LLVMValueRef si, di;
>> +      LLVMValueRef index;
>>        LLVMValueRef scalar_ptr, scalar;
>>
>> +      if (indexes2) {
>> +         si = lp_build_const_int32(bld->gallivm, i >> 1);
>> +         di = lp_build_const_int32(bld->gallivm, i);
>> +      } else {
>> +         si = lp_build_const_int32(bld->gallivm, i);
>> +         di = si;
>> +      }
>> +
>> +      if (indexes2 && (i & 1)) {
>> +         index = LLVMBuildExtractElement(builder,
>> +                                         indexes2, si, "");
>> +      } else {
>> +         index = LLVMBuildExtractElement(builder,
>> +                                         indexes, si, "");
>> +      }
>>        scalar_ptr = LLVMBuildGEP(builder, base_ptr,
>>                                  &index, 1, "gather_ptr");
>>        scalar = LLVMBuildLoad(builder, scalar_ptr, "");
>>
>> -      res = LLVMBuildInsertElement(builder, res, scalar, ii, "");
>> +      res = LLVMBuildInsertElement(builder, res, scalar, di, "");
>>     }
>>
>>     if (overflow_mask) {
>> -      res = lp_build_select(bld, overflow_mask, bld->zero, res);
>> +      if (indexes2) {
>> +         res = LLVMBuildBitCast(builder, res, bld_base->dbl_bld.vec_type, "");
>> +         overflow_mask = LLVMBuildSExt(builder, overflow_mask, bld_base->dbl_bld.int_vec_type, "");
>> +         res = lp_build_select(&bld_base->dbl_bld, overflow_mask, bld_base->dbl_bld.zero, res);
>> +      } else
>> +         res = lp_build_select(bld, overflow_mask, bld->zero, res);
>>     }
> This function looks pretty complex to me.
> I wonder if it wouldn't make more sense to use the gather as it was and
> just call it twice, with some shuffle for the fetched values afterwards.
> (There is actually some good reason why build_gather should be a
> "simple" function extracting elements, do loads, and insert the loaded
> values in a straightforward manner, this is supposed to be able to turn
> into a avx2 gather at some point, and everything doing something
> different would need to be thrown out.)

It still collapses back to the previous version if indexes2 is NULL,
so if someone does
create a version that is avx2 gather then it should use that path for
the single width vars,
and can use the slower uglier one for fp64s I would think. I don't
think we'll ever get fp64
gathers/stores to be pretty.

>> +      LLVMValueRef temp, temp2;
>> +      LLVMValueRef shuffles[8];
>> +      LLVMValueRef shuffles2[8];
>> +
>> +      for (i = 0; i < bld_base->base.type.length; i++) {
>> +         shuffles[i] = lp_build_const_int32(gallivm, i * 2);
>> +         shuffles2[i] = lp_build_const_int32(gallivm, (i * 2) + 1);
>> +      }
>> +
>> +      temp = LLVMBuildShuffleVector(builder, value, LLVMGetUndef(LLVMTypeOf(value)), LLVMConstVector(shuffles, bld_base->base.type.length), "");
>> +      temp2 = LLVMBuildShuffleVector(builder, value, LLVMGetUndef(LLVMTypeOf(value)), LLVMConstVector(shuffles2, bld_base->base.type.length), "");
> These lines are a bit long...

Indeed, cleaned that up.
>
>
>> +      lp_exec_mask_store(&bld->exec_mask, float_bld, pred, temp, chan_ptr);
>> +      lp_exec_mask_store(&bld->exec_mask, float_bld, pred, temp2, chan_ptr2);
>> +   } else {
>> +      lp_exec_mask_store(&bld->exec_mask, float_bld, pred, value, chan_ptr);
> Actually I think it would be nicer to keep the ordinary float path out
> of this function (the name also implies storing doubles) and handle that
> one-liner in the caller(s).
> Might also make sense to actually get the chan_ptrs in this function
> itself, it sort of feels unnatural to pass 2 of them around (as they are
> obviously closely related), but no biggie.

Yes I agree, I dropped this out to the callers.

> Looks good to me. I'm not entirely happy with the build_gather, but it
> is fixable later, your choice.

Yeah I think I'll leave it as is until we get some time to work on
making it avx2 compatible.

>
> Reviewed-by: Roland Scheidegger <sroland at vmware.com>
Thanks,
Dave.