[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Xu, Samuel
samuel.xu at intel.com
Tue Aug 31 00:37:07 PDT 2010
Hi, Siarhei Siamashka:
Attached patch has updated copyright part (only copyright change). we referred http://cgit.freedesktop.org/pixman/tree/COPYING.
Yes, As you assumed, we tested on multiple 32/64 bit boxes w/o and w/ SSSE3.
Thank for your patience!
Samuel
-----Original Message-----
From: Siarhei Siamashka [mailto:siarhei.siamashka at gmail.com]
Sent: Monday, August 30, 2010 9:07 PM
To: Xu, Samuel
Cc: pixman at lists.freedesktop.org; Ma, Ling; Liu, Xinyun
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
On Monday 30 August 2010 12:26:27 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> Sorry for a typo, fixed version is attached. Pls ignore pervious mail.
>
> New patch, which contains
> 1)Simplified 64 bit detect_cpu_features(), only check SSSE3 bit.
> _MSC_VER path switched to __cpuid to avoid inline asm. _MSC_VER path
> CPUID code snatch is extracted and tested on standalone C file in a 64
> bit windows system, using VS2010. 2) removed "merging", pixman-ssse3.c
> is shaped as our discussion.
+/*
+ * Copyright 2010 Intel Corporation
+ *
+ * Permission to use, copy, modify, distribute, and sell this software
+and its
+ * documentation for any purpose is hereby granted without fee,
+provided that
+ * the above copyright notice appear in all copies and that both that
+ * copyright notice and this permission notice appear in supporting
+ * documentation, and that the name of Mozilla Corporation not be used
+in
^^^^^^^^^^^^^^^^^^^
Mozilla Corporation? Looks like this is yet another case of careless copy/paste.
And I would like to remind again that the following text of copyright notice is preferred for new code. It is not strictly necessary, but I just want to be sure that you checked this link:
http://cgit.freedesktop.org/pixman/tree/COPYING
Other than that, and assuming that the patch was properly tested on 32-bit and 64-bit machines both with and without SSSE3 support, I don't see any remaining really blocker issues. Or more like I'm giving up and would like to pass the baton to somebody else.
Also I tried to run my simple microbenchmarking program on Intel Atom N450 netbook, x86_64 system. The results are the following:
--
All results are presented in millions of pixels per second
L1 - small Xx1 rectangle (fitting L1 cache), always blitted at the same
memory location with small drift in horizontal direction
L2 - small XxY rectangle (fitting L2 cache), always blitted at the same
memory location with small drift in horizontal direction
M - large 1856x1080 rectangle, always blitted at the same
memory location with small drift in horizontal direction HT - random rectangles with 32x32 average size are copied from
one 1920x1080 buffer to another, traversing from left to right
and from top to bottom
VT - random rectangles with 32x32 average size are copied from
one 1920x1080 buffer to another, traversing from top to bottom
and from left to right
R - random rectangles with 32x32 average size are copied from
random locations of one 1920x1080 buffer to another
---
reference memcpy speed = 1007.2MB/s (251.8MP/s for 32bpp pixels)
--- C ---
src_x888_8888 = L1: 265.13 L2: 223.61 M:216.12 HT:109.69 VT: 80.23 R: 75.42
--- SSE2 ---
src_x888_8888 = L1: 611.50 L2: 494.79 M:120.17 HT: 83.57 VT: 79.48 R: 60.30
--- SSE2 (prefetch removed) ---
src_x888_8888 = L1: 683.39 L2: 539.57 M:260.07 HT:128.22 VT: 85.23 R: 81.40
--- SSSE3 ---
src_x888_8888 = L1:1559.35 L2: 798.95 M:254.31 HT:129.29 VT: 84.67 R: 75.00
--
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-add-ssse3_composite_src_x888_8888.patch
Type: application/octet-stream
Size: 41086 bytes
Desc: 0001-add-ssse3_composite_src_x888_8888.patch
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20100831/d7f03981/attachment-0001.obj>
More information about the Pixman
mailing list