[Mesa-dev] [RFC] llvmpipe texture coordinate rounding

Fri Feb 14 14:35:55 PST 2014

Am 14.02.2014 19:59, schrieb Jeff Muizelaar:
> 
> On Feb 14, 2014, at 1:00 PM, Roland Scheidegger <sroland at vmware.com
> <mailto:sroland at vmware.com>> wrote:
> 
>> Am 14.02.2014 18:07, schrieb Jeff Muizelaar:
>>
>> I'll need to take another look and run some tests, though I've got some
>> quick comments:
>>
>>
>> @@ -1031,16 +1082,28 @@ lp_build_sample_image_linear(struct
>> lp_build_sample_context *bld,
>>       s = lp_build_mul_imm(&bld->coord_bld, s, 256);
>>       if (dims >= 2)
>>          t = lp_build_mul_imm(&bld->coord_bld, t, 256);
>>       if (dims >= 3)
>>          r = lp_build_mul_imm(&bld->coord_bld, r, 256);
>>    }
>>
>>    /* convert float to int */
>> +   half = lp_build_const_vec(bld->gallivm, bld->coord_bld.type, 0.5);
>> +   s = lp_build_add(&bld->coord_bld, s, half);
>> +   s = LLVMBuildFPToSI(builder, s, i32_vec_type, "");
>> +   if (dims >= 2) {
>> +      t = lp_build_add(&bld->coord_bld, t, half);
>> +      t = LLVMBuildFPToSI(builder, t, i32_vec_type, "");
>> +   }
>> +   if (dims >= 3) {
>> +      r = lp_build_add(&bld->coord_bld, r, half);
>> +      r = LLVMBuildFPToSI(builder, r, i32_vec_type, "");
>> +   }
>> +
>>    s = LLVMBuildFPToSI(builder, s, i32_vec_type, "");
>>    if (dims >= 2)
>>       t = LLVMBuildFPToSI(builder, t, i32_vec_type, "");
>>    if (dims >= 3)
>>       r = LLVMBuildFPToSI(builder, r, i32_vec_type, "");
>> This looks quite incorrect you're converting the s/t/r coords twice from
>> float to int.
> 
> Yep. I forgot to remove this second hunk.
> 
>>    /* subtract 0.5 (add -128) */
>>    i32_c128 = lp_build_const_int_vec(bld->gallivm, i32.type, -128);
>>
>>
>> Also, the add looks iffy as it won't work correctly if the coords are
>> negative, since the FPToSI is of course trunc, not floor.
> 
> I think it will be ok because the REPEAT case avoids negative coord
> before converting to int and the other cases clamp to 0.
I think for clamp_to_edge you're right, but repeat will use these coords
for the POT case (npot indeed won't care at all since it doesn't
actually use these values at all).

> 
>> Maybe instead of using add + fptosi should just use lp_build_iround
>> (which is just one sse instruction too on x86 though if you're targeting
>> another arch it will definitely be more code at least unless someone
>> adds an intrinsic for it if the cpu even has one). Might not matter
>> though depending on address mode…
> 
> Yeah, that might be a better idea.
> 
>>
>> And I might be missing something why do you think the new repeat code is
>> faster? Though that might also depend on arch_rounding being available
>> and such but at first looks it seems slightly more complex to me.
> 
> The current code converts integer and fractional parts to integer
> separately. It also does the subtract 0.5 in floating point instead of
> integer arithmetic (-128).

Yeah, you're probably right. With simple instruction counting I end up
with the same number of instructions actually (assuming arch_rounding is
available, I miscounted previously and thought it were two more), but
the ones in your version ought to be a bit cheaper.

I suspect actually nearest filtering suffers from the same inaccuracy,
we actually also do the mul by 256 so we get 8 fractional bits, then
toss away those 8 bits and just use the non-fractional part. That looks
to me like we only need these 8 bits so we get "reasonably correct"
rounding, but we still would chose the wrong sampling point sometimes
(if it's less than 1/256 pixels away from the center between two
texels). I actually noticed that one a while ago but I can't remember
why I didn't do something about it...

Roland