<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
<small>I update this in my branch<br>
<a
href="http://cgit.freedesktop.org/%7Epodain/pixman/?h=neon_bilinear">http://cgit.freedesktop.org/~podain/pixman/?h=neon_bilinear</a></small><br>
<small><br>
Fixed cache preload of src scanlines to work correctly and added
cache preload<br>
for mask and dst scanlines. Some mistakes in comments are
corrected.<br>
<br>
Thanks,<br>
Taekyun Kim<br>
<br>
</small>On 09/21/2011 05:38 PM, Taekyun Kim wrote:
<blockquote
cite="mid:1316594285-2773-1-git-send-email-podain77@gmail.com"
type="cite">
<pre wrap="">From: Taekyun Kim <a class="moz-txt-link-rfc2396E" href="mailto:tkq.kim@samsung.com"><tkq.kim@samsung.com></a>
Hi, all
Bilinear functions in pixman-arm-neon-asm-bilinear.S have lots of room
for improvement. With some clean-up to bring tail/head scheme for them
and did instruction scheduling for mostly used over_8888_8888. Passed
make check.
I'm not sure that the scheduling is done nicely, but at least I've got
speed up on both cortex-a8 and a9 devices.
Performance of before/after on cortex-a8 @ 1GHz
<< 2000 x 2000 with scale factor close to 1.x >>
before : 39.71 Mpix/s
after : 60.39 Mpix/s
Performance of before/after on cortex-a9 @ 1.2GHz
<< 2000 x 2000 with scale factor close to 1.x >>
before : 43.31 Mpix/s
after : 65.83 Mpix/s
I will do optimization of other functions too based on the perfstat
results of popular cairo traces.
--
Best Regards,
Taekyun Kim
Taekyun Kim (4):
ARM: NEON: Some cleanup of bilinear scanline functions
ARM: NEON: Bilinear macro template for instruction scheduling
ARM: NEON: Replace old bilinear scanline generator with new template
ARM: NEON: Instruction scheduling of bilinear over_8888_8888
pixman/pixman-arm-neon-asm-bilinear.S | 766 ++++++++++++++++++++++++++-------
1 files changed, 605 insertions(+), 161 deletions(-)
</pre>
</blockquote>
<br>
</body>
</html>