[Pixman] Faster unorm_to_unorm for wide path processing
Antti S. Lankila
alankila at bel.fi
Sun Jun 10 09:27:46 PDT 2012
Attached is a simple patch that produces around 20 % Mpix/s improvement
for wide path processing due to significant optimization of
pixman_expand. On my i7 laptop, we go from:
> src_8888_2x10 = L1: 62.08 L2: 60.73 M: 59.61
> ( 4.30%) HT: 46.81 VT: 42.17 R: 43.18 RT: 26.01 (
> 325Kops/s)
to
> src_8888_2x10 = L1: 76.94 L2: 78.43 M: 75.87
> ( 5.59%) HT: 56.73 VT: 52.39 R: 53.00 RT: 29.29 (
> 363Kops/s)
The key of the patch is the observation that unorm_to_unorm's work can
more easily be done with a simple multiplication and shift, when the
function is applied repeatedly and the parameters are not compile-time
constants. For instance, converting from 0xfe to 0xfefe (expanding from
8 bits to 16 bits) can be done by calculating
c = c * 0x101
However, sometimes the result is not a neat replication of all the bits.
For instance, going from 10 bits to 16 bits can be done by calculating
c = c * 0x401UL >> 4
where the intermediate result is 20 bit wide repetition of the 10-bit
pattern followed by shifting off the unnecessary lowest bits.
The patch has the algorithm to calculate the factor and the shift, and
converts the code to use it.
--
Antti
More information about the Pixman
mailing list