Added optimizations for over_n_8888_8888_ca and over_n_8888_0565_ca routines. Benchmark results (lowlevel-blt-bench and cairo-perf-trace) on Malta board (@1Ghz) are included in the log message. Any comments to this patch are welcome.