[Pixman] [PATCH 0/2] Better instruction scheduling of solid source operator OVER

Tue Aug 16 06:34:03 PDT 2011

From: Taekyun Kim <tkq.kim at samsung.com>

Hi, all

I tried to improve performance of solid geometry filling.
Pipeline stalls were eliminated as far as I could find.
Performance of L1 and L2 was improved but memory bench results are still the same.

One interesting thing was dual issue of ARM(mainly cache preload) and NEON on cortex-a9.
It doesn't seem to be that effective as on cortex-a8.
I need further investigation on it.

Below is the results of lowlevel-blt-bench

<< over_n_8_8888 >>
- cortex a8 -
before : L1: 201.51  L2: 195.59  M:105.15 ( 56.56%)  HT: 76.25  VT: 63.34  R: 50.87  RT: 20.52 ( 182Kops/s)
after  : L1: 251.56  L2: 257.06  M:107.23 ( 57.77%)  HT: 77.97  VT: 65.10  R: 58.59  RT: 21.59 ( 190Kops/s)

- cortex a9 -
before : L1: 131.80  L2: 132.91  M:125.56 ( 57.91%)  HT: 75.49  VT: 58.92  R: 46.33  RT: 17.37 ( 157Kops/s)
after  : L1: 180.52  L2: 182.45  M:131.73 ( 60.73%)  HT: 78.91  VT: 61.32  R: 49.63  RT: 17.73 ( 160Kops/s)

<< over_n_8888 >>
- cortex a8 -
before : L1: 372.71  L2: 391.26  M:114.49 ( 44.08%)  HT: 89.08  VT: 92.11  R: 85.41  RT: 42.37 ( 237Kops/s)
after  : L1: 490.81  L2: 484.38  M:114.27 ( 44.47%)  HT: 92.62  VT: 92.96  R: 85.95  RT: 31.00 ( 237Kops/s)

- cortex a9 -
before : L1: 268.69  L2: 277.31  M:151.69 ( 46.88%)  HT:104.18  VT: 86.74  R: 67.54  RT: 26.06 ( 207Kops/s)
after  : L1: 302.58  L2: 309.04  M:151.56 ( 46.93%)  HT:105.11  VT: 87.51  R: 68.28  RT: 26.10 ( 205Kops/s)

Taekyun Kim (2):
  ARM: NEON: Better instruction scheduling of over_n_8_8888
  ARM: NEON: Better instruction scheduling of over_n_8888

 pixman/pixman-arm-neon-asm.S |  106 ++++++++++++++++++++++++++++++++----------
 1 files changed, 81 insertions(+), 25 deletions(-)