[Pixman] [PATCH 2/4] Added fast path for "pad" type repeats

Thu Mar 7 07:46:50 PST 2013

Hi Søren,

> At a high level, it is pretty clear that non-transformed images are
> sampled outside their bounds often enough that special-cased support
> is probably warranted. There is already support for NORMAL repeat, and
> your patch adds PAD, but images with NONE extension could also be
> optimized in various cases, either by clipping the border away or
> implementing it with a solid fill depending on the operator.

Indeed. In fact I had a go with one such case, but the results weren't
brilliant so I was in two minds as to whether to post the patch. I'll
post a copy in a minute so you can see; I suspect a number of your
comments may apply equally to it.

> - The flags flags of images should be computed with
>   _pixman_image_validate(), not just overwritten.

You mean the line that clears out the repeat bits and sets
FAST_PATH_SAMPLES_COVER_CLIP_NEAREST? That sounds relatively expensive,
and also wouldn't set FAST_PATH_SAMPLES_COVER_CLIP_NEAREST (that's mainly
done by analyze_extent(), which isn't called by _pixman_image_validate(),
and which is necessary to match the most useful child fast paths).

In my defence, that flag manipulation was copied from
fast_composite_tiled_repeat() - literally, including the cut-and-paste
error that you picked up on below :). Are you arguing that
fast_composite_tiled_repeat() should be changed in the same way?

>   Also, I'm skeptical about simply reusing the solid function that was
>   looked up. There is nothing in principle that prevents the flags
>   from depending on the color in the image, and as such every time the
>   pixel in the solid fill image changes, a new validate/lookup cycle
>   should really be done.

Is that really a realistic possibility? Obviously it would have a
significant impact on optimisations like this one :(

> - The fallback to the general code that you have is not how the fast
>   path system is intended to work. When a fast path has been chosen,
>   it is supposed to carry out the operation by itself as fast as
>   possible and not fall back to slower code. There could in principle
>   (and at some point there may) be other fast paths in between this
>   one and the general code.

After I wrote it, it occurred to me to wonder why the tiled (aka normal-
repeat) function didn't have the same issue with mask images that don't
line up with the source image. It's not something that I expect would
happen much in real-world usage, but some of the fuzzer tests bit me on
that for pad-repeat, so it's reasonable to suppose they're doing the same
for normal-repeat. The answer is that it just adjusts the coordinates of
the mask image, as it does with the dest image, so while this will still
work out in all cases, that's mainly because it ends up missing all the
STD_FAST_PATH implementations because
FAST_PATH_SAMPLES_COVER_CLIP_NEAREST isn't set for the mask image, even
in the common case where mask tiles are aligned with source tiles and a
STD_FAST_PATH would have been applicable. A similar approach would work
for pad-repeat, removing the need for the fallback, but for the same
reasons it would probably slow down any operation with a bitmap mask. Of
course, it could have a big if-then-else to handle the case where the
source and mask images coincide separately from all other cases, but then
the function gets even bigger!

One other thing while this reminds me: the way
fast_composite_tiled_repeat() is matched in the fast path list means that
it won't match something like an over_n_8_8888 operation where the mask
is tiled (but the source isn't, of course, being a solid image). Although
this seems a reasonable operation, I wasn't able to find any instances in
the cairo-perf-traces, so didn't bother worrying about it any further.

> - You added a simple test program (thanks for that, that already puts
>   you ahead of most people), but it only checks one single composite
>   operation. This fast path is complex enough that I think the test
>   should verify many more alignments of source and destination (and
>   mask if support for that is kept), and more formats and operators.

I think I assumed that the existing fuzzers would already be doing a
reasonable job of that; this test was mainly just something that I put
together so I could quickly check visually that I hadn't made any major
mistakes in implementation (hence the switched out printf statements),
and I added it to the patch series more as an afterthought. I assume you
think we'd benefit from a fuzzer that particularly focuses on repeat
types then?

>   Also, the test should be added in a separate commit so that it is
>   easy to verify that CRC values don't change.

That's a fair point.

> +    src_flags = (info->src_flags & ~FAST_PATH_NORMAL_REPEAT) |
> +                    FAST_PATH_SAMPLES_COVER_CLIP_NEAREST;
>
>     seems like it should be ~FAST_PATH_PAD_REPEAT instead?

Yes, that was a cut-and-paste error. I spotted it myself a couple of days
ago when doing the patch I'm about to post, which has a similar statement
in it.

> But the biggest objection I have is that there is just way too much
> code here for an operation that is ultimately not that common. There
> are nine separate blocks of code that are almost exactly alike.

I was never all that happy with that myself; it would have been neater to
do it in three columns (left pad: all solid, then centre: all bitmaps,
then right: all solid) but I wanted to keep as close to scanline order as
possible to maximise cache hits where the various subregions abutted
horizontally.

> The basic idea is to break the operation down into nine smaller
> composites, where the nine source images are subimages of the original
> source with repeat mode NORMAL. This takes advantage of two existing
> optimizations: the fast_path_tiled() for 1xn images and the solid fast
> paths for repeating 1x1 images.

I see the code in compute_image_info() for converting 1x1 bitmaps into
solid images, but I must have missed the 1xn tile support. As it happens,
I did have a go at adding a special case in for 1xn tiles within
fast_composite_tiled_repeat() myself, but couldn't see any
cairo-perf-trace improvement above the noise level, so didn't bother
posting it. Of course, if fast_composite_pad_repeat() achieved its left/
right extension by calling fast_composite_tiled_repeat() for the leftmost
and rightmost columns of the source image, that would definitely change!

> An issue here is that if the single pixels in the corners are opaque and
> the operator is OVER, we'd ideally want to do the 'strength reduction'
> from optimize_operator() in pixman.c to convert the OVER to SRC instead
> so that the operation becomes a solid fill (or even a noop) rather than
> an over_n_8888().

In case you didn't notice, I got round that in my subsequent patches my
putting that test into my implementations of over_n_8888 and falling back
to fast_composite_solid_fill() or pixman_composite_src_n_8888_asm_armv6()
respectively.

Ben