[Pixman] [PATCH 2/4] Added fast path for "pad" type repeats

Thu Mar 7 14:16:04 PST 2013

Hi,

>> - The flags flags of images should be computed with
>>   _pixman_image_validate(), not just overwritten.
>
> You mean the line that clears out the repeat bits and sets
> FAST_PATH_SAMPLES_COVER_CLIP_NEAREST? That sounds relatively expensive,
> and also wouldn't set FAST_PATH_SAMPLES_COVER_CLIP_NEAREST (that's mainly
> done by analyze_extent(), which isn't called by _pixman_image_validate(),
> and which is necessary to match the most useful child fast paths).

I didn't actually mean that particular line, I was only talking about
validating the newly created images. But it's a good point that
COVER_CLIP_NEAREST wouldn't get set.

> One other thing while this reminds me: the way
> fast_composite_tiled_repeat() is matched in the fast path list means that
> it won't match something like an over_n_8_8888 operation where the mask
> is tiled (but the source isn't, of course, being a solid image). Although
> this seems a reasonable operation, I wasn't able to find any instances in
> the cairo-perf-traces, so didn't bother worrying about it any further.

It would be nice if there were a more general way for a fast path to be
recursive, where the recursive calls can take full advantage of the
optimizations done by pixman_image_composite32(), including:

    - 'strength reduction' in optimize_operator()
    - all flags, including COVER_CLIP

and where the overhead of recursion is not too devastating. (And also
avoid forcing recursion on the backends, where they might want to do the
entire operation by themselves).

So maybe we should just declare it possible to call
pixman_image_composite32, or some internal variant of it, recursively as
long as the images involved in the recursive call were different than
the ones in the outer image, and then deal with whatever bugs and
performance issues surface.

Then composite_tiled() could be broken into two functions:

     composite_tiled_src()
     composite_tiled_mask()

where composite_tiled_src() would create a clone of the src image that
had REPEAT_NORMAL disabled and then call pixman_image_composite32()
recursively, which in turn, if the mask was also repeating, would end up
calling composite_tiled_mask().

Similarly, composite_repeat_pad() would create nine cloned images along
the lines of the pseudo code I posted and make nine recursive calls to
pixman_image_composite32(). There could be additional special cased
support for 1xn/NORMAL images that would then automatically be used by
both the tiled and pad fast paths.

So two pieces of infrastructure would be needed: The ability to create
temporary subimages with different repeat modes, and the ability to call
pixman_image_composite32() recursively.

What I like about this scheme is that it makes the code reflect exactly
what's going on: composite operations are being broken down into smaller
composite operation. It really is recursion; making the code reflect
this is a good thing.

There is an obvious danger that the overhead from creating clone images
and calling pixman_image_composite32() would be too high, but it is
important to note that pixman spends most of its time actually
compositing and ones intuition about per-composite overhead can easily
be wrong. (Especially now that glyph compositing has explicit support).

However, if overhead turns out to be an issue, there are potential fixes
such as caching temporary images and making an internal variant of
pixman_image_composite32() that would drop the region computations etc.
Some such fixes could benefit the rest of pixman as well.

>> - You added a simple test program (thanks for that, that already puts
>>   you ahead of most people), but it only checks one single composite
>>   operation. This fast path is complex enough that I think the test
>>   should verify many more alignments of source and destination (and
>>   mask if support for that is kept), and more formats and operators.
>
> I think I assumed that the existing fuzzers would already be doing a
> reasonable job of that; this test was mainly just something that I put
> together so I could quickly check visually that I hadn't made any major
> mistakes in implementation (hence the switched out printf statements),
> and I added it to the patch series more as an afterthought. I assume you
> think we'd benefit from a fuzzer that particularly focuses on repeat
> types then?

As far as I can tell, blitters-test doesn't test PAD at all, so if we
are going to have special-cased support for it, then it would be
beneficial to have testing for it.

>> An issue here is that if the single pixels in the corners are opaque and
>> the operator is OVER, we'd ideally want to do the 'strength reduction'
>> from optimize_operator() in pixman.c to convert the OVER to SRC instead
>> so that the operation becomes a solid fill (or even a noop) rather than
>> an over_n_8888().
>
> In case you didn't notice, I got round that in my subsequent patches my
> putting that test into my implementations of over_n_8888 and falling back
> to fast_composite_solid_fill() or pixman_composite_src_n_8888_asm_armv6()
> respectively.

I did see that, but in principle, there could be a faster version of
solid fill available at a higher-grade implementation. E.g., if
over_n_8888 is available for mmx, but not sse2, but solid fill is
available for both, you'd want to pick the SSE2 solid fill.

The SSE2 variant of over_n_8888() doesn't have the reduction to solid
fill btw.

Søren