[Pixman] [PATCH 2/2] MIPS: DSPr2: Added fast-paths for SRC operation.

Lukic, Nemanja nlukic at mips.com
Fri Feb 17 07:51:30 PST 2012


Hi Siarhei,

Board on which I collected those results is old PC-style evaluation board with MIPS CPU chip running at 1GHz (74Kc).
More detailed information on cache for this board:
 - Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
 - Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
 - MIPS secondary cache 512kB, 8-way, linesize 32 bytes.

Your concerns for memory bandwidth make sense, but I don’t think this is related to timings and memory clock frequency configuration.
This evaluation board uses old SDRAM, which lacks a lot in performance to modern DDR2/DDR3 memory chips, and thus influence overall peak memory bandwidth.

Thanks,
Nemanja Lukic

-----Original Message-----
From: Siarhei Siamashka [mailto:siarhei.siamashka at gmail.com] 
Sent: Thursday, February 16, 2012 5:21 PM
To: Lukic, Nemanja
Cc: pixman at lists.freedesktop.org; nemanja.lukic at rt-rk.com
Subject: Re: [Pixman] [PATCH 2/2] MIPS: DSPr2: Added fast-paths for SRC operation.

On Fri, Feb 10, 2012 at 5:02 PM, Nemanja Lukic <nlukic at mips.com> wrote:
> From: Nemanja Lukic <nemanja.lukic at rt-rk.com>
>
> Following fast-path functions are implemented (routines 4, 5 and 6 utilize
> same fast-memcpy routine):
>    1. src_x888_8888
>    2. src_8888_0565
>    3. src_0565_8888
>    4. src_0565_0565
>    5. src_8888_8888
>    6. src_0888_0888

Nice. That's a good choice of useful functions to optimize.

> Performance numbers before/after on MIPS-74kc @ 1GHz
>
> Optimized (with these optimizations):
>
> lowlevel-blt-bench:
>        src_x888_8888 =  L1: 369.50  L2:  99.37  M: 27.19 (145.07%)  HT: 20.24  VT: 19.48  R: 19.00  RT: 10.22 (  63Kops/s)
>        src_8888_0565 =  L1: 105.65  L2:  67.87  M: 25.41 (101.00%)  HT: 20.78  VT: 19.84  R: 18.52  RT:  9.81 (  63Kops/s)
>        src_0565_8888 =  L1:  77.10  L2:  63.04  M: 23.37 ( 92.90%)  HT: 20.29  VT: 19.37  R: 18.14  RT: 10.02 (  63Kops/s)
>        src_0565_0565 =  L1: 519.02  L2: 241.32  M: 62.35 (166.34%)  HT: 33.74  VT: 27.63  R: 26.12  RT: 11.70 (  67Kops/s)
>        src_8888_8888 =  L1: 390.48  L2: 113.99  M: 30.32 (161.77%)  HT: 19.55  VT: 17.05  R: 17.13  RT: 10.19 (  63Kops/s)
>        src_0888_0888 =  L1: 349.74  L2: 156.68  M: 40.68 (162.78%)  HT: 25.58  VT: 20.57  R: 20.20  RT:  9.96 (  63Kops/s)

Maybe this would be interesting for you. I'm getting the following
numbers on my Asus RT-N16 router (MIPS 74K @480 MHz) with your
optimizations applied:

           src_x888_8888 =  L1: 149.94  L2:  37.43  M: 39.00 (146.51%)
           src_8888_0565 =  L1:  50.05  L2:  24.53  M: 23.77 ( 66.62%)
           src_8888_8888 =  L1: 173.30  L2:  70.62  M: 79.89 (299.11%)

Looks like your hardware has roughly twice faster CPU and some amount
of L2 cache (?), but shows ~2.6x worse peak memory bandwidth. Could it
have memory timings and/or memory clock frequency misconfigured?

-- 
Best regards,
Siarhei Siamashka


More information about the Pixman mailing list