[Pixman] [PATCH 01/11] mmx: add store function and use it in add_8888_8888
Søren Sandmann
sandmann at cs.au.dk
Fri Mar 16 10:06:21 PDT 2012
Most of this series look good, but a few comments:
- Considering the fact that all the intrinsics are reimplemented in
inline assembly, do you actually need -march=loongson?
- Should the header file loongson-mmintrin.h be added to
libpixman_loongson_mmi_la_SOURCES? Otherwise it might not show up in
tarballs.
- It would be useful to have benchmark results in the commit log to be
sure that the MMI support is actually an improvement. Especially
cairo-trace runs since they are more realistic workloads than
lowlevel-blt.
- It may be possible to do this:
if (is_equal (_mm_and_si64 (vsrc, MC (full_alpha)), MC (full_alpha)))
from mmx_combine_over_u() a bit more efficiently with an is_opaque()
function that did something like
_mm_cmpeq_pi8 (tmp, tmp); /* always 0xffffffff */
return _mm_movemask_pi8 (_mm_cmpeq_pi8 (pixel, tmp)) & 0x40
Ie., check that just the 7th byte is 0xff.
Regarding the optimizations of in_8_8, in_n_8_8, and add_8_8,
- in_8_8 seems to be mostly a loss on iwMMX and not much of an
improvement on Loongson. I might believe that a more realistic
benchmark such as one of the cairo traces would show more of an
improvement, though.
- in_n_8_8: I don't understand this one. You replace a multiplication
with a branch, but (a) you don't avoid any memory references, which is
usually the point of these check-for-zero optimizations, and (b) you
only do it in the edge case and not the middle of the scanlines?
- add_8_8: Here too you also only do the optimization in the edge
cases.
Søren
More information about the Pixman
mailing list