[Pixman] [PATCH 01/11] mmx: add store function and use it in add_8888_8888

Søren Sandmann sandmann at cs.au.dk
Fri Mar 16 10:06:21 PDT 2012

Most of this series look good, but a few comments:

- Considering the fact that all the intrinsics are reimplemented in
  inline assembly, do you actually need -march=loongson?

- Should the header file loongson-mmintrin.h be added to
  libpixman_loongson_mmi_la_SOURCES? Otherwise it might not show up in

- It would be useful to have benchmark results in the commit log to be
  sure that the MMI support is actually an improvement. Especially
  cairo-trace runs since they are more realistic workloads than

- It may be possible to do this:

        if (is_equal (_mm_and_si64 (vsrc, MC (full_alpha)), MC (full_alpha)))

  from mmx_combine_over_u() a bit more efficiently with an is_opaque()
  function that did something like

        _mm_cmpeq_pi8 (tmp, tmp);       /* always 0xffffffff */
        return _mm_movemask_pi8 (_mm_cmpeq_pi8 (pixel, tmp)) & 0x40

  Ie., check that just the 7th byte is 0xff.

Regarding the optimizations of in_8_8, in_n_8_8, and add_8_8, 

- in_8_8 seems to be mostly a loss on iwMMX and not much of an
  improvement on Loongson. I might believe that a more realistic
  benchmark such as one of the cairo traces would show more of an
  improvement, though.

- in_n_8_8: I don't understand this one. You replace a multiplication
  with a branch, but (a) you don't avoid any memory references, which is
  usually the point of these check-for-zero optimizations, and (b) you
  only do it in the edge case and not the middle of the scanlines?

- add_8_8: Here too you also only do the optimization in the edge


More information about the Pixman mailing list