[Pixman] [PATCH 0/4] ARM: REPEAT_NORMAL support for standard fast paths

Siarhei Siamashka siarhei.siamashka at gmail.com
Fri Jul 15 14:47:04 PDT 2011


On Tue, Jul 12, 2011 at 5:05 PM, Soeren Sandmann <sandmann at cs.au.dk> wrote:
> Taekyun Kim <podain77 at gmail.com> writes:
>
>> On 07/11/2011 09:18 PM, Soeren Sandmann wrote:
>>> This performance regression was introduced when the "simple repeat" code
>>> was removed. But I'm not sure hacking it into the ARM backend is the
>>> right plan. See this mail for a different approach:
>>>
>>>      http://lists.freedesktop.org/archives/pixman/2010-December/000815.html
>>>
>>> I have a branch with a start on doing it that way here:
>>>
>>>      http://cgit.freedesktop.org/~sandmann/pixman/log/?h=simple-repeat
>>>
>>> which may or may not be useful as a starting point. (I'd be interested
>>> in seeing what the benchmark results of that branch are).
>>
>> It seems to be the right place where we can put simple repeat codes.
>> It can handle simple repeat for both sse2 and ARM at common place.
>>
>> I'm a bit worried that tiling does not give us good memory access patterns
>> causing cache overhead. 1 x n source images would be as slow as 90 degree
>> rotation. Memory buffer will be accessed in vertical order.
>
> Yeah, that is a problem, and that was in fact one of the reasons the
> original 'simple repeat' code was deleted. It's memory access pattern
> for 1xn images was really bad.  It may be that adding this support to
> the ARM backend, as you did, is the better way.

Actually I agree with your older comment. Adding this support
exclusively to ARM backend is not so great. Taking slow paths for
normal repeat was also spotted by Mozilla [1], and I guess they are
interested in getting this issue fixed for all platforms, and not just
ARM. So far reverting the simple repeat code deletion seemed to be
also kind of usable solution. And 1xn source images problem is a bit
overrated, even though it would be surely nice to get it fixed.

More cache efficient memory access pattern optimization by extending
source images can be probably applied to 'fast_composite_tiled_repeat'
function.

I would also like to see the results of benchmarks using cairo traces
(for the current pixman master, for the proposed patches, and maybe
also for simple repeat code deletion reverted). Just to be sure that
we don't get any unexpected performance regressions. After all, adding
support of normal repeat for standard fast paths touches the
frequently used parts of code.

1. https://bugzilla.mozilla.org/show_bug.cgi?id=640250#c5

-- 
Best regards,
Siarhei Siamashka


More information about the Pixman mailing list