[Pixman] [PATCH 1/5] ARMv6: Lay the groundwork for later patches in the series

Siarhei Siamashka siarhei.siamashka at gmail.com
Wed Jan 23 05:53:44 PST 2013


On Sat, 19 Jan 2013 16:16:49 +0000
Ben Avison <bavison at riscosopen.org> wrote:

> Move the entire contents of pixman-arm-simd-asm.S to a new file;
> ultimately this will only retain the scaled operations, so it is
> named pixman-arm-simd-asm-scaled.S. Added new header file
> pixman-arm-simd-asm.h, containing the macros which are the basis of
> all the new ARMv6 implementations, although at this point in the
> series, nothing uses them and the library should be binary-identical.

[...]

More comments describing the input arguments for the "preload_line"
macro would be definitely welcome. And also some high level overview
for the "narrow", "medium" and "wide" cases.

>
+.macro preload_line    narrow_case, bpp, bpp_shift, base
> + .if bpp > 0
> +  .if narrow_case && (bpp <= dst_w_bpp)
> +        /* In these cases, each line for each channel is in either 1 or 2 cache lines */
> +        PF  bic,    WK0, base, #31
> +        PF  pld,    [WK0]
> +        PF  add,    WK1, base, X, LSL #bpp_shift
> +        PF  sub,    WK1, WK1, #1
> +        PF  bic,    WK1, WK1, #31
> +        PF  cmp,    WK1, WK0
> +        PF  beq,    90f
> +        PF  pld,    [WK1]
> +90:
> +  .else
> +        PF  bic,    WK0, base, #31
> +        PF  pld,    [WK0]
> +        PF  add,    WK1, base, X, lsl #bpp_shift
> +        PF  sub,    WK1, WK1, #1
> +        PF  bic,    WK1, WK1, #31
> +        PF  cmp,    WK1, WK0
> +        PF  beq,    92f

> +91:     PF  add,    WK0, WK0, #32
> +        PF  cmp,    WK0, WK1
> +        PF  pld,    [WK0]
> +        PF  bne,    91b

How many iterations does this loop typically run? If this tries to
preload the whole scanline (as the name of the macro implies), then
we may have some problems after 3rd iteration.

ARM11 can only support three outstanding cache misses at a time, and
on the 4th iteration the PLD instruction will block because it is
treated mostly as LDR without destination register (and as NOP for
TLB misses).

The TLB misses are another potential source of performance problems. If
you try to prefetch too much and too far away, you may move into the
next page, which is not present in TLB and all the nice prefetches will
be wasted.

> +92:
> +  .endif
> + .endif
> +.endm

-- 
Best regards,
Siarhei Siamashka


More information about the Pixman mailing list