[Mesa-dev] Problems with accuracy of coeffs_init + attribs_update
Oded Gabbay
oded.gabbay at gmail.com
Thu Oct 29 04:27:02 PDT 2015
Hi Roland, Jose
I wanted to bring a problem I found to your attention, and discuss
about it and ways to solve it.
I'm working on regressions of piglit gpu.py between x86-64 and
ppc64le, when running with llvmpipe.
One of the regressions manifests itself in 2 tests,
clip-distance-bulk-copy and clip-distance-itemized-copy
What happens in ppc64le is that *some* of the interpolated per-vertex
values (clip-distance values) that constitute the input into the
fragment shader, are not calculated according to the required accuracy
of the test (which is an error/distance of less than 1e-6)
It took a while, but I believe I found the reason. In general, when
running on ppc64le, the code path that is taken at
lp_build_interp_soa_init() is doing bld->simple_interp = FALSE, which
means using coeffs_init() + attribs_update(). That's in contrast with
x86-64, where bld->simple_interp = TRUE, which means the code will use
coeffs_init_simple() + attribs_update_simple()
In specific, the problem lies in how the coeffs are calculated inside
coeffs_init. To simply put it, they are only *actually* calculated
correctly for the first quad. The coeffs are later *updated* for the
other quads, using dadx/dady and dadq.
------
side-note:
I believe you even documented it in coeffs_init():
/*
* a = a0 + (x * dadx + y * dady)
* a0aos is the attrib value at top left corner of stamp
*/
------
The root cause for that, I believe, is because coeffs_init does *not*
take into account the different x/y offsets for the other quads. In
contrast, on x86-64, the code uses a function called calc_offset(),
which does take the different x/y offsets into account.
Please look at this definition (from lp_bld_interp.c) :
static const unsigned char quad_offset_x[16] = {0, 1, 0, 1, 2, 3, 2,
3, 0, 1, 0, 1, 2, 3, 2, 3};
static const unsigned char quad_offset_y[16] = {0, 0, 1, 1, 0, 0, 1,
1, 2, 2, 3, 3, 2, 2, 3, 3};
Now, look at how coeffs_init() uses them:
for (i = 0; i < coeff_bld->type.length; i++) {
LLVMValueRef nr = lp_build_const_int32(gallivm, i);
LLVMValueRef pixxf = lp_build_const_float(gallivm, quad_offset_x[i]);
LLVMValueRef pixyf = lp_build_const_float(gallivm, quad_offset_y[i]);
pixoffx = LLVMBuildInsertElement(builder, pixoffx, pixxf, nr, "");
pixoffy = LLVMBuildInsertElement(builder, pixoffy, pixyf, nr, "");
}
So, in ppc64le, where coeff_bld->type.length == 4, we *always* take
the first four values from the above arrays. The code *never*
calculates the coeffs based on the other offsets.
Admittedly, that's *almost* always works and I doubt you will see a
difference in display. However, in the tests I mentioned above, it
creates, for some pixels, a bigger error than what is allowed.
The first thing that came into my mind was to change, in
lp_build_interp_soa_init():
"if (coeff_type.length > 4)"
to this:
"if (coeff_type.length >= 4)"
When I did that, I saw it fixes the above tests and didn't cause any
regressions on ppc64le and/or on x86-64.
However, I wanted to ask you guys if you remember why you allowed only
vectors with size above 4 to use the "simple" route ?
Was there any performance hit ?
I think that to fix coeffs_init()+attribs_update() is to basically
re-design them. Otherwise, I would have tried to fix them first.
In any case, I would love to hear your feedback and to discuss this further.
Thanks,
Oded
More information about the mesa-dev
mailing list