[Pixman] [PATCH 0/5] Normal repeat mode support for nearest/bilinear scaling fast paths

Siarhei Siamashka siarhei.siamashka at gmail.com
Fri May 20 06:20:24 PDT 2011

On Mon, May 16, 2011 at 3:26 PM, Taekyun Kim <podain77 at gmail.com> wrote:
> Hi all,
> Recently I've profiled several applications using pixman.
> (Webkit cairo port is my major concern)

Yes, the browsers indeed seem to be the most heavy users of different
variants of compositing operations, especially html5 canvas.

> Most of cases are covered by C, SSE2 and NEON fast paths now.
> However siarhei's slow-path-reporter still reports some non-optimized
> general_composite_rect().
> One of them is PIXMAN_REPEAT_NORMAL.
> This repeat mode is used for drawing tiled background or some gradient
> patterns.
> I've modified nearest/bilinear fast path macro templates to support NORMAL
> repeat.

That's great. Anything that noticeably improves performance without
introducing regressions is a welcome addition. With this particular
code we just may need to be a bit careful not to make it
unmaintainable before it additionally gets full NONE repeat support
for the sources without alpha channel and also horizontal flipping
(unit_x < 0).

> The basic idea is breaking down one scanline composition into several
> NONE_REPEAT scanline compositions.
> So we can use already implemented scanline functions.
> This approach does not increase binary size much and gives us reasonable
> speed up.
> When width of src image is very small, too frequent function call can
> decrease performance.
> In this case, pre-repeating src scanline into temporary buffer can reduce
> the function calls.
> Bilinear may require last and first pixels to interpolate.
> Temporary wrap-around buffer can resolve this case.

Some temporary buffer is also going to be useful to be used in two
pass "unscaled YUV|r5g6b5 -> a8r8g8b8 conversion / bilinear scaling
a8r8g8b8 -> a8r8g8b8|r5g6b5" for the performance reasons. So something
like this is needed there anyway.

> I want to know whether this approach is proper way to go.

Yes, I think that this approach is ok in general.

> Maybe you guys are thinking of handling repeat mode inside scanline
> functions??

Trying multiple alternatives is actually fine. Only by running
benchmarks (preferably simulating some real use cases) we may find out
which one works the best. And the optimal solution may even not be the
same for different platforms.

> (Some scanline functions take max_vx though it is not used yet).

Handling repeat mode inside of scanline functions is relatively simple
for nearest scaling, and it is actually used now for C implementation:
For the potential ARM NEON optimizations and the operators other than
SRC, such handling of NORMAL repeat can be done via ARM instructions,
which are essentially free because NEON pipeline is more heavily
loaded and determines the execution time:
But the handling of repeat inside of scanline functions is not going
to work well for bilinear scaling.

So I like your approach. And I'll add some more comments for the code
itself a bit later.

Best regards,
Siarhei Siamashka

More information about the Pixman mailing list