[cairo] [PATCH] Updates for the scaling test and more fixes

Mon Apr 27 05:28:19 PDT 2009

On Saturday 25 April 2009 05:25:53 ext Soeren Sandmann wrote:
> Siarhei Siamashka <siarhei.siamashka at nokia.com> writes:
> > Now test provides better coverage for various image scaling
> > cases. Unused byte for x8r8g8b8 format is ignored. Running
> > the test program without any command line options provides
> > a PASS/FAIL verdict based on the precalculated crc32 value
> > for using pixman with all the fastpath functions disabled,
> > which should simplify testing for correcteness.
>
> Pushed.

Thanks.

> > From d7b7a4e4e78e48b76d9596276ee9a2c1d68c4451 Mon Sep 17 00:00:00 2001
> > From: Siarhei Siamashka <siarhei.siamashka at nokia.com>
> > Date: Mon, 20 Apr 2009 16:22:42 +0300
> > Subject: [PATCH] Fixed rendering bug for source alpha == 0 in OVER
> > fastpath functions
> >
> > Handling of the case when source alpha is zero was keeping destination
> > pixel unmodified. But this is different from how generic path behaves.
> > For example fbOver(0x00200483, 0x9CAC7E9F) == 0x9CCC82FF and the
> > destination pixel changes from 0x9CAC7E9F to 0x9CCC82FF in spite
> > of having zero alpha.
>
> It is true that pixels with alpha=0 and color channels != 0 can't be
> skipped, but the
>
>         if (a)
>
> optimization is actually fairly important because memory access is so
> expensive. So a better fix is to check that the whole source is 0
> rather than just the alpha.

I would not be too sure about this. In the case where (almost) the whole
source image is transparent it would definitely help. But in the case when we
have interleaving transparent/opaque pixels, this extra pixel write skipping
logic may negatively affect performance:
1. It's an extra branch in the code and in some cases it might not play nice
with the branch predictor.
2. It might confuse write combining buffer and negatively impact performance.
Also if we even have only one pixel that needs to be read/modified per cache
line, skipping some of the writes will not help to reduce memory bandwidth
much.

Also when alpha blending is fused with scaling and processed in a single pass,
CPU usage reduction may be more important than memory write optimizations.

It's better to benchmark different code variants on different types of images.

> > From afbfb6405d63b196960a2acc5f3a12f2d3f1686a Mon Sep 17 00:00:00 2001
> > From: Siarhei Siamashka <siarhei.siamashka at nokia.com>
> > Date: Mon, 20 Apr 2009 16:32:14 +0300
> > Subject: [PATCH] Removed unnecessary processing of undefined byte for
> > x8r8b8g8 format
> >
> > Top 8 bits of x8r8g8b8 format are undefined as explained in:
> > http://lists.cairographics.org/archives/cairo/2009-April/016902.html
>
> This one does not apply anymore after Jeff pushed the ARM fast paths,
> but looks otherwise good. Can you resend it, or publish a link to a
> git repository?
>
> (It could be argued that we should treat the x8 channel only as
> undefined on read, and leave it alone when writing, but it's probably
> not worth the performance impact of an extra memory read per pixel).

There are different possible interpretations for this x8 components:
1. x8 channel is completely undefined
2. x8 channel is always set to some predefined value (0x00 or 0xFF).
3. x8 channel is set to some predefined value, but the user is also
responsible for setting it correct in x8r8g8b8 images which are used in
input arguments passed to pixman function. Otherwise the results in x8
components may be undefined.

I would still want to have the expected behavior confirmed and probably
documented somewhere :)

> > From 90f8369894b28cc3cca757a329352fe19943de16 Mon Sep 17 00:00:00 2001
> > From: Siarhei Siamashka <siarhei.siamashka at nokia.com>
> > Date: Mon, 20 Apr 2009 16:36:08 +0300
> > Subject: [PATCH] Removal of unused fbCompositeSrc_8888x0888 function
>
> Did you try enabling it in the fast path table? Apart from the same
> alpha issue as above, it seems fine to me.

The 'fbOver24' function which is used in it looks not very optimal:

static uint32_t fbOver24 (uint32_t x, uint32_t y)
{
    uint16_t  a = ~x >> 24;
    uint16_t  t;
    uint32_t  m,n,o;

    m = FbOverU(x,y,0,a,t);
    n = FbOverU(x,y,8,a,t);
    o = FbOverU(x,y,16,a,t);
    return m|n|o;
}

Normal 'fbOver' uses some kind of SIMD to process 2 pixel components at once.

This may need to be benchmarked.

> Its parameters as far as I can tell would be
>
>         PIXMAN_OP_OVER, PIXMAN_a8r8g8b8, PIXMAN_null, PIXMAN_r8g8b8
>         PIXMAN_OP_OVER, PIXMAN_a8b8g8r8, PIXMAN_null, PIXMAN_b8g8r8

There is one drawback if I understand it correctly. This increases the size of
fastpath pointers table and thus negatively impacts performance. Pixman has a
simple linear walk over the table, which is not very efficient.

> These functions, on the other hand:
>
>         fbCompositeSolidMask_nx1xn()
>         fbCompositeSrcAdd_1000x1000()
>         fbCompositeSrcSrc_nxn()
>
> I think we can just delete. If we add core drawing to pixman some day,
> they can be resurrected.

Fully agreed here.

-- 
Best regards,
Siarhei Siamashka