[Pixman] [PATCH 1/1 v2] vmx: workarounds to fix powerpc little endian particularities
Oded Gabbay
oded.gabbay at gmail.com
Tue Jun 9 04:34:33 PDT 2015
On Mon, Jun 8, 2015 at 12:57 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
> On Wed, Jun 3, 2015 at 6:42 AM, Siarhei Siamashka
> <siarhei.siamashka at gmail.com> wrote:
>>> + AVV (endian_xor.c[1]),0);
>>> + perm = vec_xor (perm,(vector unsigned char) AVV (
>>> + 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x04,
>>> + 0x08, 0x08, 0x08, 0x08, 0x0C, 0x0C, 0x0C, 0x0C));
>>> + return vec_perm (pix, pix, perm);
>>> }
>>
>> For this part, both the original and the patched code resulted in
>> identical instruction sequences:
>>
>> 0000000000000000 <.vmx_splat_alpha>:
>> 0: 3d 22 00 00 addis r9,r2,0
>> 4: 39 29 00 00 addi r9,r9,0
>> 8: 7c 00 48 ce lvx v0,0,r9
>> c: 10 42 10 2b vperm v2,v2,v2,v0
>> 10: 4e 80 00 20 blr
>>
>> This is actually good. I was afraid that the compiler might screw up
>> it a bit and do something stupid like adding an extra VXOR instruction
>> here (for the 'vec_xor' intrinsic).
>>
>
> Actually, I get a different disassembly:
>
> 0000000000007b10 <vmx_splat_alpha>:
> 7b10: 00 00 4c 3c addis r2,r12,0
> 7b14: 00 00 42 38 addi r2,r2,0
> 7b18: 00 00 22 3d addis r9,r2,0
> 7b1c: 0c 03 23 10 vspltisb v1,3
> 7b20: 00 00 29 39 addi r9,r9,0
> 7b24: 99 4e 00 7c lxvd2x vs32,0,r9
> 7b28: 57 02 00 f0 xxswapd vs32,vs32
> 7b2c: d7 04 01 f0 xxlxor vs32,vs33,vs32
> 7b30: 17 05 00 f0 xxlnor vs32,vs32,vs32
> 7b34: 2b 10 42 10 vperm v2,v2,v2,v0
> 7b38: 20 00 80 4e blr
>
> And without the patch, I get this:
>
> 0000000000007930 <vmx_splat_alpha>:
> 7930: 00 00 4c 3c addis r2,r12,0
> 7934: 00 00 42 38 addi r2,r2,0
> 7938: 00 00 22 3d addis r9,r2,0
> 793c: 00 00 29 39 addi r9,r9,0
> 7940: 98 4e 00 7c lxvd2x vs0,0,r9
> 7944: 50 02 00 f0 xxswapd vs0,vs0
> 7948: 11 05 00 f0 xxlnor vs32,vs0,vs0
> 794c: 2b 10 42 10 vperm v2,v2,v2,v0
> 7950: 20 00 80 4e blr
>
> So there is an added vspltisb + xxlxor command.
> I used the default configure+make.
> Maybe I need to define some special flag to the compiler ?
>
> This is my gcc version:
> gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
> I'm running RHEL 7.1 ppc64le on POWER8 machine.
>
> Oded
So I understood where my confusion came from.
The disassembly you showed is for ppc64/be , while i work on ppc64/le
So on ppc64/le, the added commands are:
xxswapd vs0,vs0 <-- swap perm after load from memory
xxlnor vs32,vs0,vs0 <-- NOR perm before the vperm command
>From what I understand, there is no way to eliminate these commands
(unless writing inline assembly).
The patch added commands, as I said, are vspltisb + xxlxor, so it is
definitely better to remove these to make the overhead in ppc64/le to
be just 2 commands instead of 4 commands.
Using the #ifdef BIG will eliminate it.
Oded
More information about the Pixman
mailing list