[Pixman] Faster unorm_to_unorm for wide path processing

Antti S. Lankila alankila at bel.fi
Sun Jun 10 09:27:46 PDT 2012

Attached is a simple patch that produces around 20 % Mpix/s improvement 
for wide path processing due to significant optimization of 
pixman_expand. On my i7 laptop, we go from:

> src_8888_2x10 =  L1:  62.08  L2:  60.73  M: 59.61
>                   (  4.30%)  HT: 46.81  VT: 42.17  R: 43.18  RT: 26.01 (
>                   325Kops/s)


>  src_8888_2x10 =  L1:  76.94  L2:  78.43  M: 75.87
>                   (  5.59%)  HT: 56.73  VT: 52.39  R: 53.00  RT: 29.29 (
>                   363Kops/s)

The key of the patch is the observation that unorm_to_unorm's work can 
more easily be done with a simple multiplication and shift, when the 
function is applied repeatedly and the parameters are not compile-time 
constants. For instance, converting from 0xfe to 0xfefe (expanding from 
8 bits to 16 bits) can be done by calculating

c = c * 0x101

However, sometimes the result is not a neat replication of all the bits. 
For instance, going from 10 bits to 16 bits can be done by calculating

c = c * 0x401UL >> 4

where the intermediate result is 20 bit wide repetition of the 10-bit 
pattern followed by shifting off the unnecessary lowest bits.

The patch has the algorithm to calculate the factor and the shift, and 
converts the code to use it.


More information about the Pixman mailing list