[Pixman] [PATCH 0/2] Better instruction scheduling of solid source operator OVER
podain77 at gmail.com
Sun Aug 21 22:01:53 PDT 2011
From: Taekyun Kim <tkq.kim at samsung.com>
I send fixed version of previous patches.
Previous version can be seen from here.
Some registers were clobbered and I did mistakes when running make check.
Now it passes make check.
I measured performance again with new cortex-a9 device running @ 1.2GHz.
Here are the numbers.
<< over_n_8_8888 >>
- cortex a8 -
before : L1: 201.35 L2: 190.48 M:101.94 ( 54.85%) HT: 78.41 VT: 63.83 R: 58.25 RT: 21.74 ( 191Kops/s)
after : L1: 257.65 L2: 255.49 M:102.04 ( 55.33%) HT: 79.19 VT: 65.46 R: 59.23 RT: 21.12 ( 189Kops/s)
- cortex a9 -
before : L1: 157.35 L2: 159.81 M:133.00 ( 60.94%) HT: 82.44 VT: 63.64 R: 51.66 RT: 19.15 ( 179Kops/s)
after : L1: 216.83 L2: 219.40 M:135.83 ( 61.80%) HT: 85.60 VT: 64.80 R: 52.23 RT: 19.16 ( 179Kops/s)
<< over_n_8888 >>
- cortex a8 -
before : L1: 375.39 L2: 391.93 M:114.39 ( 40.99%) HT: 99.37 VT: 98.20 R: 90.24 RT: 32.87 ( 240Kops/s)
after : L1: 481.90 L2: 483.46 M:114.29 ( 40.69%) HT:106.91 VT: 93.38 R: 90.74 RT: 29.51 ( 236Kops/s)
- cortex a9 -
before : L1: 324.50 L2: 332.79 M:155.55 ( 47.51%) HT:111.93 VT: 93.58 R: 71.92 RT: 28.21 ( 233Kops/s)
after : L1: 355.87 L2: 364.49 M:156.90 ( 47.59%) HT:111.52 VT: 91.76 R: 72.16 RT: 28.22 ( 234Kops/s)
And here are some replies to siarhei's comments.
> Thanks, that's good. By the way, do you have some other a bit less
> synthetic benchmarks for geometry filling workload, which could show
> some measurable performance improvement?
I don't have micro-benchmark for geometry filling, instead I usually run
some HTML5 canvas benchmarks which strokes and fills various kind of paths.
Below link is one of the benchmarks that I use.
> You can also try to run benchmarks using with different prefetch types
> in 'pixman-arm-neon-asm.S' by changing:
> .set PREFETCH_TYPE_DEFAULT, PREFETCH_TYPE_ADVANCED
> to either of
> .set PREFETCH_TYPE_DEFAULT, PREFETCH_TYPE_NONE
> .set PREFETCH_TYPE_DEFAULT, PREFETCH_TYPE_SIMPLE
> Cortex-A9 has automatic hardware prefetcher. And if it is enabled
> (there are some hardware bugs in early Cortex-A9 revisions, so it may
> be disabled for a good reason), then PREFETCH_TYPE_NONE may be a good
When I simply removed prefetching codes, I got no performance changes
in memory, but I got slowdown for L2 performance. I will check those
prefetch types NONE/SIMPLE for other fast paths too.
> In any case, now things are a little bit more interesting because with
> the latest ARM hardware we got a downgraded NEON unit in Cortex-A9
> (or speaking positively, should we say "balanced"?) and somewhat
> better memory bandwidth.
Thanks for all your information about behaviour of ARM and NEON.
That was very helpful to me. Definitely making things well optimized on
both Cortex-A8 and A9 is quite necessary and I will contribute in that
Taekyun Kim (2):
ARM: NEON better instruction scheduling of over_n_8_8888
ARM: NEON better instruction scheduling of over_n_8888
pixman/pixman-arm-neon-asm.S | 138 +++++++++++++++++++++++++++++++++---------
test/lowlevel-blt-bench.c | 9 ++-
2 files changed, 113 insertions(+), 34 deletions(-)
More information about the Pixman