[Mesa-dev] [RFC] llvmpipe texture coordinate rounding

Fri Feb 14 15:05:18 PST 2014

Am 14.02.2014 23:35, schrieb Roland Scheidegger:
> Am 14.02.2014 19:59, schrieb Jeff Muizelaar:
>>
>> On Feb 14, 2014, at 1:00 PM, Roland Scheidegger <sroland at vmware.com
>> <mailto:sroland at vmware.com>> wrote:
>>
>>> Am 14.02.2014 18:07, schrieb Jeff Muizelaar:
>>>
>>> I'll need to take another look and run some tests, though I've got some
>>> quick comments:
>>>
>>>
>>> @@ -1031,16 +1082,28 @@ lp_build_sample_image_linear(struct
>>> lp_build_sample_context *bld,
>>>       s = lp_build_mul_imm(&bld->coord_bld, s, 256);
>>>       if (dims >= 2)
>>>          t = lp_build_mul_imm(&bld->coord_bld, t, 256);
>>>       if (dims >= 3)
>>>          r = lp_build_mul_imm(&bld->coord_bld, r, 256);
>>>    }
>>>
>>>    /* convert float to int */
>>> +   half = lp_build_const_vec(bld->gallivm, bld->coord_bld.type, 0.5);
>>> +   s = lp_build_add(&bld->coord_bld, s, half);
>>> +   s = LLVMBuildFPToSI(builder, s, i32_vec_type, "");
>>> +   if (dims >= 2) {
>>> +      t = lp_build_add(&bld->coord_bld, t, half);
>>> +      t = LLVMBuildFPToSI(builder, t, i32_vec_type, "");
>>> +   }
>>> +   if (dims >= 3) {
>>> +      r = lp_build_add(&bld->coord_bld, r, half);
>>> +      r = LLVMBuildFPToSI(builder, r, i32_vec_type, "");
>>> +   }
>>> +
>>>    s = LLVMBuildFPToSI(builder, s, i32_vec_type, "");
>>>    if (dims >= 2)
>>>       t = LLVMBuildFPToSI(builder, t, i32_vec_type, "");
>>>    if (dims >= 3)
>>>       r = LLVMBuildFPToSI(builder, r, i32_vec_type, "");
>>> This looks quite incorrect you're converting the s/t/r coords twice from
>>> float to int.
>>
>> Yep. I forgot to remove this second hunk.
>>
>>>    /* subtract 0.5 (add -128) */
>>>    i32_c128 = lp_build_const_int_vec(bld->gallivm, i32.type, -128);
>>>
>>>
>>> Also, the add looks iffy as it won't work correctly if the coords are
>>> negative, since the FPToSI is of course trunc, not floor.
>>
>> I think it will be ok because the REPEAT case avoids negative coord
>> before converting to int and the other cases clamp to 0.
> I think for clamp_to_edge you're right, but repeat will use these coords
> for the POT case (npot indeed won't care at all since it doesn't
> actually use these values at all).
Hmm actually looking even closer I suspect this could introduce some
slight error with texel offsets, since these are added later hence
negative values DO matter. So probably safer to just always use correct
rounding. FWIW I'm wondering if we could use llvm intrinsics for this
instead of coming up with our own, llvm seems to have nearbyint, rint
and round. All returning floats though, but maybe it would recognize
patterns if combined with fptosi (I have no idea which one I'd have to
use though nor if they work and if so with what backends).

> 
>>
>>> Maybe instead of using add + fptosi should just use lp_build_iround
>>> (which is just one sse instruction too on x86 though if you're targeting
>>> another arch it will definitely be more code at least unless someone
>>> adds an intrinsic for it if the cpu even has one). Might not matter
>>> though depending on address mode…
>>
>> Yeah, that might be a better idea.
>>
>>>
>>> And I might be missing something why do you think the new repeat code is
>>> faster? Though that might also depend on arch_rounding being available
>>> and such but at first looks it seems slightly more complex to me.
>>
>> The current code converts integer and fractional parts to integer
>> separately. It also does the subtract 0.5 in floating point instead of
>> integer arithmetic (-128).
> 
> Yeah, you're probably right. With simple instruction counting I end up
> with the same number of instructions actually (assuming arch_rounding is
> available, I miscounted previously and thought it were two more), but
> the ones in your version ought to be a bit cheaper.
> 
> I suspect actually nearest filtering suffers from the same inaccuracy,
> we actually also do the mul by 256 so we get 8 fractional bits, then
> toss away those 8 bits and just use the non-fractional part. That looks
> to me like we only need these 8 bits so we get "reasonably correct"
> rounding, but we still would chose the wrong sampling point sometimes
> (if it's less than 1/256 pixels away from the center between two
> texels). I actually noticed that one a while ago but I can't remember
> why I didn't do something about it...
> 

Roland