[Pixman] [PATCH 1/1 v2] vmx: workarounds to fix powerpc little endian particularities

Oded Gabbay oded.gabbay at gmail.com
Tue Jun 9 04:34:33 PDT 2015


On Mon, Jun 8, 2015 at 12:57 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
> On Wed, Jun 3, 2015 at 6:42 AM, Siarhei Siamashka
> <siarhei.siamashka at gmail.com> wrote:
>>> +                                       AVV (endian_xor.c[1]),0);
>>> +    perm = vec_xor (perm,(vector unsigned char) AVV (
>>> +                       0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x04,
>>> +                       0x08, 0x08, 0x08, 0x08, 0x0C, 0x0C, 0x0C, 0x0C));
>>> +    return vec_perm (pix, pix, perm);
>>>  }
>>
>> For this part, both the original and the patched code resulted in
>> identical instruction sequences:
>>
>> 0000000000000000 <.vmx_splat_alpha>:
>>    0:   3d 22 00 00     addis   r9,r2,0
>>    4:   39 29 00 00     addi    r9,r9,0
>>    8:   7c 00 48 ce     lvx     v0,0,r9
>>    c:   10 42 10 2b     vperm   v2,v2,v2,v0
>>   10:   4e 80 00 20     blr
>>
>> This is actually good. I was afraid that the compiler might screw up
>> it a bit and do something stupid like adding an extra VXOR instruction
>> here (for the 'vec_xor' intrinsic).
>>
>
> Actually, I get a different disassembly:
>
> 0000000000007b10 <vmx_splat_alpha>:
>     7b10:       00 00 4c 3c     addis   r2,r12,0
>     7b14:       00 00 42 38     addi    r2,r2,0
>     7b18:       00 00 22 3d     addis   r9,r2,0
>     7b1c:       0c 03 23 10     vspltisb v1,3
>     7b20:       00 00 29 39     addi    r9,r9,0
>     7b24:       99 4e 00 7c     lxvd2x  vs32,0,r9
>     7b28:       57 02 00 f0     xxswapd vs32,vs32
>     7b2c:       d7 04 01 f0     xxlxor  vs32,vs33,vs32
>     7b30:       17 05 00 f0     xxlnor  vs32,vs32,vs32
>     7b34:       2b 10 42 10     vperm   v2,v2,v2,v0
>     7b38:       20 00 80 4e     blr
>
> And without the patch, I get this:
>
> 0000000000007930 <vmx_splat_alpha>:
>     7930:       00 00 4c 3c     addis   r2,r12,0
>     7934:       00 00 42 38     addi    r2,r2,0
>     7938:       00 00 22 3d     addis   r9,r2,0
>     793c:       00 00 29 39     addi    r9,r9,0
>     7940:       98 4e 00 7c     lxvd2x  vs0,0,r9
>     7944:       50 02 00 f0     xxswapd vs0,vs0
>     7948:       11 05 00 f0     xxlnor  vs32,vs0,vs0
>     794c:       2b 10 42 10     vperm   v2,v2,v2,v0
>     7950:       20 00 80 4e     blr
>
> So there is an added vspltisb + xxlxor command.
> I used the default configure+make.
> Maybe I need to define some special flag to the compiler ?
>
> This is my gcc version:
> gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
> I'm running RHEL 7.1 ppc64le on POWER8 machine.
>
> Oded

So I understood where my confusion came from.
The disassembly you showed is for ppc64/be , while i work on ppc64/le

So on ppc64/le, the added commands are:
xxswapd vs0,vs0 <-- swap perm after load from memory
xxlnor  vs32,vs0,vs0 <-- NOR perm before the vperm command

>From what I understand, there is no way to eliminate these commands
(unless writing inline assembly).

The patch added commands, as I said, are vspltisb + xxlxor, so it is
definitely better to remove these to make the overhead in ppc64/le to
be just 2 commands instead of 4 commands.
Using the #ifdef BIG will eliminate it.

Oded


More information about the Pixman mailing list