Added optimizations for over_n_8_8888 and over_n_8_0565 routines. Benchmark results (lowlevel-blt-bench and cairo-perf-trace) on Malta board (@1Ghz) are included in the log message. Any comments to this patch are welcome.