[Pixman] [PATCH 3/5] ARMv6: New blit routines
Ben Avison
bavison at riscosopen.org
Tue Jan 22 06:26:50 PST 2013
On Tue, 22 Jan 2013 13:10:54 -0000, Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> Just one thing looks a bit odd.
>
>> src_8888_8888
>>
>> Before After
>> Mean StdDev Mean StdDev Confidence Change
>> M 57.0 0.2 89.2 0.5 100.0% +56.4%
>
> 89.2 MPix/s * 32bpp = ~357 MB/s
>
>> src_0565_0565
>>
>> Before After
>> Mean StdDev Mean StdDev Confidence Change
>> M 90.7 0.4 133.5 0.7 100.0% +47.1%
>
> 133.5 MPix/s * 16bpp = ~267 MB/s
>
> Seems to be a much less efficient use of memory bandwidth here
> compared to src_8888_8888?
I think what you're seeing here is the speed difference between the word-
aligned and misaligned code paths, because the M test cycles over many
starting X positions for the source buffer, but always uses X=1 for the
destination buffer. For 32bpp, all pixel positions are word-aligned, but
for 16bpp this will result in half the runs being misaligned.
I tried tweaking the M test to force alignment or misalignment, to get
some comparative timings for src_0565_0565:
Aligned: 169.8 Mpix/s (rather closer to 2* the 32bpp result)
Unaligned: 108.6 Mpix/s
I'm open to suggestions as to how to improve the misaligned case. Early
in development, I compared the speed of doing LDM followed by in-
register shuffling with either ORR or PKH instructions against using
lots of unaligned LDRs, and the LDRs came out fastest by a small margin,
which is why that's what's used in my patch.
Ben
More information about the Pixman
mailing list