[Pixman] [PATCH 0/2] A couple of ARM improvements

Søren Sandmann Pedersen sandmann at cs.au.dk
Mon Apr 4 08:33:06 PDT 2011


I got myself an ARM netbook which by default is using subpixel
antialiased text in 565 mode, so fast_composite_over_n_8888_0565_ca()
is showing up on profiles. The following is an implementation of that
operation in NEON assembly along with a tiny improvement to the
8888_8888_ca version, where a single quad-word instruction could be
used instead of two double-word instructions.

Some notes:

- The combine_mask_ca replacement could be shared between 8888_ca and
  0565_ca, except that the 565 version can ignore the mask alpha
  because it doesn't need compute a destination alpha. 

  It may make sense to split this code into a new macro that takes an
  "opaque_destination" argument. This would allow the same
  optimization to be used for x8r8g8b8 destinations.

- Further optimization is possible if we know that the solid source is
  opaque, which is very often the case. When it is, there is no need
  to multiply it onto the mask. An "opaque_source" argument could be
  added to the macro as well and then separate x888 versions of the
  function could be generated.

- It might be a win to optimize out destination access whenever the
  src * mask is 0 or 1. However, this might require some more involved
  changes to the code generation framework.

This is my first attempt at writing NEON assembly, so review is
appreciated.


Thanks,
Soren



More information about the Pixman mailing list