[Pixman] [PATCH 2/2] MIPS: DSPr2: Added fast-paths for SRC operation.
Lukic, Nemanja
nlukic at mips.com
Fri Feb 17 07:51:30 PST 2012
Hi Siarhei,
Board on which I collected those results is old PC-style evaluation board with MIPS CPU chip running at 1GHz (74Kc).
More detailed information on cache for this board:
- Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
- Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
- MIPS secondary cache 512kB, 8-way, linesize 32 bytes.
Your concerns for memory bandwidth make sense, but I don’t think this is related to timings and memory clock frequency configuration.
This evaluation board uses old SDRAM, which lacks a lot in performance to modern DDR2/DDR3 memory chips, and thus influence overall peak memory bandwidth.
Thanks,
Nemanja Lukic
-----Original Message-----
From: Siarhei Siamashka [mailto:siarhei.siamashka at gmail.com]
Sent: Thursday, February 16, 2012 5:21 PM
To: Lukic, Nemanja
Cc: pixman at lists.freedesktop.org; nemanja.lukic at rt-rk.com
Subject: Re: [Pixman] [PATCH 2/2] MIPS: DSPr2: Added fast-paths for SRC operation.
On Fri, Feb 10, 2012 at 5:02 PM, Nemanja Lukic <nlukic at mips.com> wrote:
> From: Nemanja Lukic <nemanja.lukic at rt-rk.com>
>
> Following fast-path functions are implemented (routines 4, 5 and 6 utilize
> same fast-memcpy routine):
> 1. src_x888_8888
> 2. src_8888_0565
> 3. src_0565_8888
> 4. src_0565_0565
> 5. src_8888_8888
> 6. src_0888_0888
Nice. That's a good choice of useful functions to optimize.
> Performance numbers before/after on MIPS-74kc @ 1GHz
>
> Optimized (with these optimizations):
>
> lowlevel-blt-bench:
> src_x888_8888 = L1: 369.50 L2: 99.37 M: 27.19 (145.07%) HT: 20.24 VT: 19.48 R: 19.00 RT: 10.22 ( 63Kops/s)
> src_8888_0565 = L1: 105.65 L2: 67.87 M: 25.41 (101.00%) HT: 20.78 VT: 19.84 R: 18.52 RT: 9.81 ( 63Kops/s)
> src_0565_8888 = L1: 77.10 L2: 63.04 M: 23.37 ( 92.90%) HT: 20.29 VT: 19.37 R: 18.14 RT: 10.02 ( 63Kops/s)
> src_0565_0565 = L1: 519.02 L2: 241.32 M: 62.35 (166.34%) HT: 33.74 VT: 27.63 R: 26.12 RT: 11.70 ( 67Kops/s)
> src_8888_8888 = L1: 390.48 L2: 113.99 M: 30.32 (161.77%) HT: 19.55 VT: 17.05 R: 17.13 RT: 10.19 ( 63Kops/s)
> src_0888_0888 = L1: 349.74 L2: 156.68 M: 40.68 (162.78%) HT: 25.58 VT: 20.57 R: 20.20 RT: 9.96 ( 63Kops/s)
Maybe this would be interesting for you. I'm getting the following
numbers on my Asus RT-N16 router (MIPS 74K @480 MHz) with your
optimizations applied:
src_x888_8888 = L1: 149.94 L2: 37.43 M: 39.00 (146.51%)
src_8888_0565 = L1: 50.05 L2: 24.53 M: 23.77 ( 66.62%)
src_8888_8888 = L1: 173.30 L2: 70.62 M: 79.89 (299.11%)
Looks like your hardware has roughly twice faster CPU and some amount
of L2 cache (?), but shows ~2.6x worse peak memory bandwidth. Could it
have memory timings and/or memory clock frequency misconfigured?
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list