[Pixman] [PATCH 3/5] ARMv6: New blit routines

Tue Jan 22 06:26:50 PST 2013

On Tue, 22 Jan 2013 13:10:54 -0000, Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> Just one thing looks a bit odd.
>
>> src_8888_8888
>>
>>     Before          After
>>     Mean   StdDev   Mean   StdDev  Confidence  Change
>> M   57.0   0.2      89.2   0.5     100.0%      +56.4%
>
> 89.2 MPix/s * 32bpp = ~357 MB/s
>
>> src_0565_0565
>>
>>     Before          After
>>     Mean   StdDev   Mean   StdDev  Confidence  Change
>> M   90.7   0.4      133.5  0.7     100.0%      +47.1%
>
> 133.5 MPix/s * 16bpp = ~267 MB/s
>
> Seems to be a much less efficient use of memory bandwidth here
> compared to src_8888_8888?

I think what you're seeing here is the speed difference between the word-
aligned and misaligned code paths, because the M test cycles over many
starting X positions for the source buffer, but always uses X=1 for the
destination buffer. For 32bpp, all pixel positions are word-aligned, but
for 16bpp this will result in half the runs being misaligned.

I tried tweaking the M test to force alignment or misalignment, to get
some comparative timings for src_0565_0565:

Aligned: 169.8 Mpix/s (rather closer to 2* the 32bpp result)
Unaligned: 108.6 Mpix/s

I'm open to suggestions as to how to improve the misaligned case. Early
in development, I compared the speed of doing LDM followed by in-
register shuffling with either ORR or PKH instructions against using
lots of unaligned LDRs, and the LDRs came out fastest by a small margin,
which is why that's what's used in my patch.

Ben