[Pixman] [PATCH 7/7] utils.c: Increase acceptable deviation to 0.0064 in pixel_checker_t

Tue Feb 12 13:17:12 PST 2013

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

>> > Moreover, anyone using r5g6b5 format is most likely either memory or
>> > performance constrained, so they would not particularly appreciate the
>> > more accurate, but slower conversion between a8r8g8b8 and r5g6b5.
>> 
>> It's not an academic discussion btw. If we add dithering, the difference
>> between shifting and rounding becomes very obvious. Here are two images,
>> both containing a gradient rendered three different ways: once onto
>> r5g6b5 without dithering, once onto a8r8g8b8 without dithering, and once
>> with dithering onto r5g6b5.
>> 
>> In the first image, bitshifting is used:
>> 
>>     http://people.freedesktop.org/~sandmann/dither-shift.png
>> 
>> In the second, rounding is used:
>> 
>>     http://people.freedesktop.org/~sandmann/dither-round.png
>> 
>> In the first image, there is an obvious darkening in the dithered
>> gradient. In the second, the difference is visible, but fairly
>> subtle. Even the undithered gradient, while ugly in both cases, is
>> rendered visibly more faithfully with rounding.
>
> Are you using http://en.wikipedia.org/wiki/Ordered_dithering ?
> When adding a threshold map to pixels, the image gets a bit lighter.
> Wouldn't dropping the lower bits actually compensate this?

The error in converting from [0,255] to [0,63] through bitshift is not a
consistent darkening; it is a darkening of dark values and a lightening
of light values. For example 0xf8 / 255.0 = 0.973, but 0xf8 gets
converted to 62 which corresponds to 62/63.0 = 0.984.

Here is a graph of the error from bit shifting:

    http://people.freedesktop.org/~sandmann/shift-error.png

and here is the graph for rounding:

    http://people.freedesktop.org/~sandmann/round-error.png

If the intervals in question were [0,256] and [0,64], the correct
conversion would be a division by 4, and so a truncating shift would
produce a darkening. However, for the intervals [0,255] and [0,63] the
right conversion is a division by 255/63.0 = 4.0476190476190474, so a
division by 4 produces a slightly-too-large value which is then
compensated for by the truncation, producing the error shown in the
graph above.

> Basically, is the darkening really a problem specifically with
> conversion and not with dithering?

The specific dither algorithm I used was a variant of ordered dither
using the dither matrix from GdkRGB

   http://git.gnome.org/browse/gtk+/tree/gdk/gdkrgb.c?id=2.24.15#n968

which is a 128 x 128 table containing 256 copies of the values from 0 to
63 arranged in a blue-noise pattern.

In order to avoid biasing the values by the dithering itself, I
subtracted 32 from the dither before shifting it into the lower bits, so
that it would have a mean value of (close to) 0.

You are right that for this particular gradient, the combination of
simply adding the dither without subtracting 32, followed by a bitshift
produces a better result:

    http://people.freedesktop.org/~sandmann/dither-add-shift.png

But this is because this gradient is dark so the bitshift has a
darkening effect. For a light gradient, adding followed by shift
produces a lightening effect:

    http://people.freedesktop.org/~sandmann/dither-add-shift-light.png

where subtracting by 32 and rounding still produces the right colors:

    http://people.freedesktop.org/~sandmann/dither-round-light.png

All of the images were created like this:

1. The undithered and dithered gradients were rendered onto a 565 image
   with either shifting or rounding.

2. The 565 image was SRCed to an 8888 surface with either replication or
   rounding

3. An undithered gradient was rendered onto the 8888 surface.

So the images also include the effect of rounding vs. bit replication
for upconversion.

[ Aside about dithering: Theoretically, dithering should be done by
adding noise uniformly distributed over [-q/2, q/2] where q is the
quantization step. That is, the really right formula is this:

    s6 = floor (((s8 / 255.0) + (d/63.0 - 0.5) * (1/63.0)) * 63.0 + 0.5)

where the dither signal is scaled precisely rather than shifted.

An approximation of that formula is here:

   http://people.freedesktop.org/~sandmann/dither-perfect.png

(only an approximation because it converts to 8 bit before converting to
5/6 bits), which can be compared to the rounded version:

   http://people.freedesktop.org/~sandmann/dither-perfect.png

The 'perfect' variant is slightly too light for lighter colors, but
matches better at the darker end. It may be that to get an exact match,
a gamma adjustment should be applied to the dither signal. ]

>> In the first image, bitshifting is used:
>> 
>>     http://people.freedesktop.org/~sandmann/dither-shift.png
>> 
>> In the second, rounding is used:
>> 
>>     http://people.freedesktop.org/~sandmann/dither-round.png
>> 
> Also we would prefer a lossless r5g6b5 -> r8g8b8 -> r5g6b5
> conversion round-trip. Replicating the high bits and then
> dropping them when converting back meets this requirement.
> Doing correct rounding may be also fine, but this needs
> to be confirmed.

Here is a python program that can verify this:

    def round_trip (n_bits):
        m = (1 << n_bits) - 1.0;
        for i in range (0, (1 << n_bits)):
            v8 = int ((i / m) * 255.0 + 0.5)
            vl = int ((v8 / 255.0) * m + 0.5)

            assert vl == i

    for j in range (1, 9):
        round_trip (j)

There is also a straightforward argument that a low-bit value will be
converted to the closest 8 bit value, which in turn will be converted
back to the closest low-bit value, and that has to be the same as the
original because the distance between low-bit values is bigger than
between high-bit values.

>> The question I'm trying to answer is how much deviation should be
>> considered acceptable. The answer is unlikely to be: "We got it
>> precisely right back when the bitexact test suite was added",
>> especially, as you pointed out, there are places where we could improve
>> both performance and accuracy. That goes for r5g6b5 too btw. For
>> over_8888_0565(), this:
>> 
>>        s + DIV_63 ((255 - a) * d)
>> 
>> would likely be both faster and more accurate than
>> 
>>        s + DIV_255 ((255 - a) * ((d << 4) | (d >> 2)))
>
> Yes, that's exactly this case and also over_n_8_0565() which are most
> important. With the NEON code and excessive performance already
> saturating memory bandwidth in many cases, it is easy to ignore this
> optimization, but for ARMv6 it may be beneficial.

The rounding conversion from 8 bit to 6 bits can be done like this:

    (253 * g8 + 512) >> 10

which on NEON can be done with a multiplication and a rounding shift. In
the worst case of src_8888_0565, which is a pure conversion, only three
more instructions would be required, which I doubt would be enough to
push it over the memory bandwidth limit. I think DSPr2 also has rounding
shift instructions.

But the impact on ARMv6 may certainly be more severe. It would be
interesting to try to quantify that impact.

> As for x86, I believe that r5g6b5 format is not in use anymore.

If phones or tablets with Atom chips start appearing, I suppose that
might change.

> Sure, that's good to have both new tests and the fixes for PDF
> operators.
>
> Still bit-exact testing may have some more life left in it. I'll try
> to explain why. Looks like nowadays ARM SoCs tend to have dedicated
> hardware accelerators for 2D graphics. This includes Exynos4
> (ODROID-U2), Exynos5 (ARM Chromebook), OMAP5 and also Allwinner A10
> (Mele A1000 / MK802 / cubieboard). Not to try taking them into use
> would be a waste.
>
> It is debatable where exactly this 2D hardware acceleration is better
> to be plugged in the end (as a pixman backend, a thin wrapper around
> pixman, cairo or just X11 DDX driver). However pixman test suite
> is quite useful for the correctness validation. It just needs to be
> extended to do random tests similar to blitters-test, but with
> multiple images, randomly switching between 2D acceleration and CPU
> rendering and also executing sets of random operations on random
> triples of images (including mask). This extra test complexity is
> needed to stress asynchronous completion of operations and cache
> coherency. But if doing sets of multiple compositing operations,
> then the precision expectations for each final pixel may be quite
> difficult to set. If we expect bit-exact results, then everything
> is simple. The only difficulty is that the results for the rendering
> via 2D accelerator actually happen to differ from pixman. And
> because the 2D accelerator hardware can't be really changed and is
> not very configurable, it is pixman that can be adjusted to still
> keep bit-exact results in such tests.

You mean having a CRC32 value for each type of hardware?

Part of my motivation for doing tolerance based tests is that that would
also be useful for validating the correctness of Render in the X server,
but making sure that the output doesn't change unexpectedly is also
useful.

Søren