[Pixman] [PATCH] MIPS: DSPr2: Added over_n_8888_8888_ca and over_n_8888_0565_ca fast paths.
Siarhei Siamashka
siarhei.siamashka at gmail.com
Mon Mar 12 16:11:04 PDT 2012
On Mon, Mar 12, 2012 at 11:20 PM, Lukic, Nemanja <nlukic at mips.com> wrote:
> Hi Soren,
>
> I usually select cairo-perf-trace that utilize optimized fast path the most.
> In this case, xfce4-terminal-a1 proved to be that one. I use oprofile to check CPU utilization. Here is oprofile log I got for the xfce4-terminal-a1:
>
> CPU: MIPS 74K, speed 0 MHz (estimated)
> Counted CYCLES events (Cycles) with a unit mask of 0x00 (No unit mask) count 40000
> samples % image name app name symbol name
> 2658517 50.3337 no-vmlinux no-vmlinux /no-vmlinux
> 1216517 23.0323 libpixman-1.so libpixman-1.so pixman_composite_over_n_8888_8888_ca_asm_mips
> 270995 5.1308 libc-2.11.2.so libc-2.11.2.so memset
> 165057 3.1250 libm-2.11.2.so libm-2.11.2.so floor
> 139880 2.6483 libpixman-1.so libpixman-1.so pixman_fill_buff32_mips_dsp
> 136303 2.5806 libpixman-1.so libpixman-1.so fetch_scanline_a8
> 61821 1.1705 libc-2.11.2.so libc-2.11.2.so memcpy
> ...
>
> All other traces don't utilize this fast-path that much (this is what my oprofile runs on the test system showed).
> If you know some more suitable trace (or system configuration I need to have, like fonts installed, etc), please let me know, and I'll re-run the benchmarks and update the commit.
You can try to install terminus font
(http://terminus-font.sourceforge.net/) just to check if this has any
effect on the fast paths used. However the trace will not be useful
for benchmarking your over_n_8888_8888_ca and over_n_8888_0565_ca
optimizations any more. Anyway, the purpose of running benchmarks is
to confirm the performance improvement, so I guess this trace is also
fine even though it does not behave as originally intended.
By the way, oprofile logs are also quite informative and may be useful
as part of the commit message. By the way, it is a good idea to
configure oprofile to collect statistics separately per process
instead of the flat report for the whole system. This can be done in
the following way:
# opcontrol --deinit
# opcontrol --separate=kernel
# opcontrol --init
Then collect the statistics:
# opcontrol --reset
# opcontrol --start
# ./some-test-binary
# opcontrol --stop
And show it:
# opreport -l ./some-test-binary
When the statistics is collected per process, the idle time currently
attributed to no-vmlinux will disappear, the results should become
perfectly reproducible across multiple runs and can be also used to
evaluate the effect of optimizations.
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list