[Pixman] [PATCH 7/7] utils.c: Increase acceptable deviation to 0.0064 in pixel_checker_t

Sat Feb 2 12:23:04 PST 2013

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

>> == a8r8g8b8 OVER r5g6b5 ==
>> 
>> When OVER compositing the a8r8g8b8 pixel 0x0f00c300 with the x14r5g6b5
>
> Did you actually mean x14r6g6b6?

Yes, thanks.

>> pixel 0x03c0, the true floating point value of the resulting green
>> channel is:
>> 
>>    0xc3 / 255.0 + (1.0 - 0x0f / 255.0) * (0x0f / 63.0) = 0.9887955
>> 
>> but when compositing 8 bit values, where the 6-bit green channel is
>> converted to 8 bit through bit replication, the 8-bit result is:
>> 
>>    0xc3 + ((255 - 0x0f) * 0x3c + 127) / 255 = 251
>> 
>> which corresponds to a real value of 0.984314. The difference from the
>> true value is 0.004482 which is bigger than the acceptable deviation
>> of 0.004. So, if we were to compute all the CONJOINT/DISJOINT
>> operators in floating point, or otherwise make them more accurate, the
>> acceptable deviation could be set at 0.0045.
>> 
>> If we were doing the 6-bit conversion with rounding:
>> 
>>    (x / 63.0 * 255.0 + 0.5)
>> 
>> instead of bit replication, the deviation in this particular case
>> would be only 0.0005, so we may want to consider this at some
>> point.
>
> This has been also discussed here:
>
>     http://comments.gmane.org/gmane.comp.graphics.pixman/1891
>
> Though the bit replication when converting to 8-bit is not so bad.
> Dropping lower bits when converting back introduces a bigger error.
>
> Anyway, if I remember correctly, the accuracy loss has been well known
> since the time when bitexact testing was introduced. Other than using
> less accurate but faster conversion approximations, currently there
> is also an assumption that separate "fetch -> combine -> store" steps
> must provide exactly the same results as the fast path functions doing
> the same operations in one go. This restriction surely inhibits
> performance and accuracy. Certain platforms (ARM11 and MIPS32) should
> be able to improve performance a bit if we go away from bitexact
> correctness testing and allow more freedom in implementations. So this
> patchset indeed looks rather useful.
>
> However I think that we may need to come to an agreement on the primary
> purpose of the 8-bit pipeline, especially now that we also have a
> floating point pipeline. In my opinion, the 8-bit integer pipeline
> should always favour performance over accuracy in the case of doubt.

I agree that the primary purpose of the 8-bit pipeline is
performance. If performance didn't matter, we could just use floating
point for everything. But clearly we can't allow arbitrary deviation
from the exact computation, so the question has to be how much deviation
is acceptable.

> Moreover, anyone using r5g6b5 format is most likely either memory or
> performance constrained, so they would not particularly appreciate the
> more accurate, but slower conversion between a8r8g8b8 and r5g6b5.

It's not an academic discussion btw. If we add dithering, the difference
between shifting and rounding becomes very obvious. Here are two images,
both containing a gradient rendered three different ways: once onto
r5g6b5 without dithering, once onto a8r8g8b8 without dithering, and once
with dithering onto r5g6b5.

In the first image, bitshifting is used:

    http://people.freedesktop.org/~sandmann/dither-shift.png

In the second, rounding is used:

    http://people.freedesktop.org/~sandmann/dither-round.png

In the first image, there is an obvious darkening in the dithered
gradient. In the second, the difference is visible, but fairly
subtle. Even the undithered gradient, while ugly in both cases, is
rendered visibly more faithfully with rounding.

> There are also other libraries and alternative solutions out
> there. The competition between different mobile browsers and UI
> toolkits for the embedded systems seems to be heavily focused on
> performance. Every little bit is relevant.

Well, we could start doing division by 255 in this way:

       (a * b + 0xff) >> 8

The error is not that severe, and it would be a little bit faster than
(t = a * b + 0x80, (t + (t >> 8)) >> 8.

If we had started out doing divisions in the above way, would we now be
debating whether the additional shift instruction in the

       (t + (t >> 8)) >> 8)

formula would be worth the higher precision?

The question I'm trying to answer is how much deviation should be
considered acceptable. The answer is unlikely to be: "We got it
precisely right back when the bitexact test suite was added",
especially, as you pointed out, there are places where we could improve
both performance and accuracy. That goes for r5g6b5 too btw. For
over_8888_0565(), this:

       s + DIV_63 ((255 - a) * d)

would likely be both faster and more accurate than

       s + DIV_255 ((255 - a) * ((d << 4) | (d >> 2)))

> And while we are talking about this, bilinear interpolation precision
> is also somewhat related here (the choice of 7-bit vs. 4-bit) and
> whether we can avoid doing correct rounding for it or not.

To use the tolerance based tests as implemented by do_composite(), I
think both the reference and the test subject have to use the same
subsampling precision.

(Dithered rounding could also be used here, btw.)

> On the other hand, the floating point pipeline is a good place to
> implement sRGB, accurate format conversions and the other nice things.
> In other words, it can favour accuracy over performance.

In my view, the floating point pipeline should eventually implement
everything with high accuracy so that it can be used both as a reference
for a tolerance based test suite, and as a fallback for operations that
don't have fast paths. I have a start on that here:

    http://cgit.freedesktop.org/~sandmann/pixman/log/?h=float-imp

Trying to verify that that branch fixes the a2r10g10b10->a8r8g8b8
precision loss is what prompted this patch set and some upcoming fixes
for the PDF operators.

Søren