[Pixman] [PATCH 3/5] ARMv6: Rewrite of non-scaled fast paths
Siarhei Siamashka
siarhei.siamashka at gmail.com
Thu Dec 27 04:27:59 PST 2012
On Fri, 21 Dec 2012 18:51:03 -0000
"Ben Avison" <bavison at riscosopen.org> wrote:
> There is very little in common with the previous revision of this source
> file, but I present it as a patch nevertheless.
Can we have some more descriptive commit message for this and the
other patches? Preferably benchmark results should be also here for
the newly added or improved code.
> diff --git a/pixman/pixman-arm-simd-asm.S b/pixman/pixman-arm-simd-asm.S
> index b438001..8700da9 100644
> --- a/pixman/pixman-arm-simd-asm.S
> +++ b/pixman/pixman-arm-simd-asm.S
[...]
> +.macro over_8888_8888_1pixel src, dst, offset, next
> + /* src = destination component multiplier */
> + rsb WK&src, WK&src, #255
> + /* Split even/odd bytes of dst into SCRATCH/dst */
> + uxtb16 SCRATCH, WK&dst
> + uxtb16 WK&dst, WK&dst, ror #8
> + /* Multiply through, adding 0.5 to the upper byte of result for rounding */
> + mla SCRATCH, SCRATCH, WK&src, MASK
> + mla WK&dst, WK&dst, WK&src, MASK
> + /* Where we would have had a stall between the result of the first MLA and the shifter input,
> + * reload the complete source pixel */
> + ldr WK&src, [SRC, #offset]
> + /* Multiply by 257/256 to approximate 256/255 */
> + uxtab16 SCRATCH, SCRATCH, SCRATCH, ror #8
> + /* In this stall, start processing the next pixel */
> + .if offset < -4
> + mov WK&next, WK&next, lsr #24
> + .endif
> + uxtab16 WK&dst, WK&dst, WK&dst, ror #8
> + /* Recombine even/odd bytes of multiplied destination */
> + mov SCRATCH, SCRATCH, ror #8
> + sel WK&dst, SCRATCH, WK&dst
> + /* Saturated add of source to multiplied destination */
> + uqadd8 WK&dst, WK&dst, WK&src
> +.endm
Looks like this over_8888_8888_1pixel macro uses one instruction more
than the current code. Is it intended?
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list