[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Siarhei Siamashka siarhei.siamashka at gmail.com
Mon Aug 30 06:06:45 PDT 2010


On Monday 30 August 2010 12:26:27 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> 	Sorry for a typo, fixed version is attached. Pls ignore pervious mail.
> 
> 	New patch, which contains
> 	1)Simplified 64 bit detect_cpu_features(), only check SSSE3 bit. _MSC_VER
> path switched to __cpuid to avoid inline asm. _MSC_VER path CPUID code
> snatch is extracted and tested on standalone C file in a 64 bit windows
> system, using VS2010. 2) removed "merging", pixman-ssse3.c is shaped as
> our discussion.

+/*
+ * Copyright 2010 Intel Corporation
+ *
+ * Permission to use, copy, modify, distribute, and sell this software and its
+ * documentation for any purpose is hereby granted without fee, provided that
+ * the above copyright notice appear in all copies and that both that
+ * copyright notice and this permission notice appear in supporting
+ * documentation, and that the name of Mozilla Corporation not be used in
                                        ^^^^^^^^^^^^^^^^^^^

Mozilla Corporation? Looks like this is yet another case of careless copy/paste.

And I would like to remind again that the following text of copyright notice is
preferred for new code. It is not strictly necessary, but I just want to be
sure that you checked this link:
http://cgit.freedesktop.org/pixman/tree/COPYING

Other than that, and assuming that the patch was properly tested on 32-bit and 64-bit machines both with and without SSSE3 support, I don't see any remaining
really blocker issues. Or more like I'm giving up and would like to pass the baton to somebody else.


Also I tried to run my simple microbenchmarking program on Intel Atom N450 netbook, x86_64 system. The results are the following:

--
All results are presented in millions of pixels per second
L1  - small Xx1 rectangle (fitting L1 cache), always blitted at the same
      memory location with small drift in horizontal direction
L2  - small XxY rectangle (fitting L2 cache), always blitted at the same
      memory location with small drift in horizontal direction
M   - large 1856x1080 rectangle, always blitted at the same
      memory location with small drift in horizontal direction
HT  - random rectangles with 32x32 average size are copied from
      one 1920x1080 buffer to another, traversing from left to right
      and from top to bottom
VT  - random rectangles with 32x32 average size are copied from
      one 1920x1080 buffer to another, traversing from top to bottom
      and from left to right
R   - random rectangles with 32x32 average size are copied from
      random locations of one 1920x1080 buffer to another
---
reference memcpy speed = 1007.2MB/s (251.8MP/s for 32bpp pixels)

--- C ---
 src_x888_8888 = L1: 265.13 L2: 223.61 M:216.12 HT:109.69 VT: 80.23 R: 75.42

--- SSE2 ---
 src_x888_8888 = L1: 611.50 L2: 494.79 M:120.17 HT: 83.57 VT: 79.48 R: 60.30

--- SSE2 (prefetch removed) ---
 src_x888_8888 = L1: 683.39 L2: 539.57 M:260.07 HT:128.22 VT: 85.23 R: 81.40

--- SSSE3 ---
 src_x888_8888 = L1:1559.35 L2: 798.95 M:254.31 HT:129.29 VT: 84.67 R: 75.00


-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20100830/1cd5f7c7/attachment.pgp>


More information about the Pixman mailing list