[Pixman] [PATCH] SIMD: Try without any CFLAGS before forcing -mcpu=

Siarhei Siamashka siarhei.siamashka at gmail.com
Fri Mar 19 08:02:44 PDT 2010

On Friday 19 March 2010, Martin Jansa wrote:
> On Fri, Mar 19, 2010 at 03:07:24PM +0200, Siarhei Siamashka wrote:
> [...]
> > The whole issue is very strange. This 'pixman_transform_init_identity'
> > function is defined in 'pixman-matrix.c' file. The compiler seems to
> > generate a call to 'memset' using the variant of 'blx' instruction which
> > is invalid for armv4t compatible processors.
> >
> > I have no idea how this all could be related to the presence of absence
> > of arm-simd support. For me gcc-4.4.3 generates the following thumb code
> > for 'pixman_transform_init_identity' function when targeting armv4t
> > (support for arm-simd is also enabled):

> With SIMD configure.ac patch included:
> and blx is back :/

Looks like I see what is happening.

Let's take the following example:

/********** test.c *************/
#include <string.h>

void f()
    volatile char buffer[1024];
    memset((void *)buffer, 0, 1024);

int main()
    return 0;

/********* test_asm.S **********/
.arch armv4t
.global asm_thumb
.global asm_arm

    bx    lr

    bx    lr

This assembly file just contains two dummy functions (do nothing and return),
one is using arm instructions, and another one is using thumb. 

Compiling test.c alone into an object file:
# gcc -march=armv4t -mthumb-interwork -mthumb -O2 -c test.c
# objdump -d test.o

00000000 <f>:
   0:   b580            push    {r7, lr}
   2:   4f09            ldr     r7, [pc, #36]   (28 <f+0x28>)
   4:   2280            movs    r2, #128
   6:   44bd            add     sp, r7
   8:   00d2            lsls    r2, r2, #3
   a:   2100            movs    r1, #0
   c:   4668            mov     r0, sp
   e:   f7ff fffe       bl      0 <memset>
  12:   f7ff fffe       bl      0 <asm_arm>
  16:   f7ff fffe       bl      0 <asm_thumb>
  1a:   2380            movs    r3, #128
  1c:   00db            lsls    r3, r3, #3
  1e:   449d            add     sp, r3
  20:   bc80            pop     {r7}
  22:   bc01            pop     {r0}
  24:   4700            bx      r0
  26:   46c0            nop                     (mov r8, r8)
  28:   fffffc00        .word   0xfffffc00

The calls to all three functions show up as 'bl' instructions.

Now let's compile everything together:
# gcc -march=armv4t -mthumb-interwork -mthumb -O2 -o test test.c test_asm.S
# objdump -d test

000083fc <f>:
    83fc:       b580            push    {r7, lr}
    83fe:       4f09            ldr     r7, [pc, #36]   (8424 <f+0x28>)
    8400:       2280            movs    r2, #128
    8402:       44bd            add     sp, r7
    8404:       00d2            lsls    r2, r2, #3
    8406:       2100            movs    r1, #0
    8408:       4668            mov     r0, sp
    840a:       f7ff ef92       blx     8330 <_init+0x4c>
    840e:       f000 e816       blx     843c <asm_arm>
    8412:       f000 f811       bl      8438 <asm_thumb>
    8416:       2380            movs    r3, #128
    8418:       00db            lsls    r3, r3, #3
    841a:       449d            add     sp, r3
    841c:       bc80            pop     {r7}
    841e:       bc01            pop     {r0}
    8420:       4700            bx      r0
    8422:       46c0            nop                     (mov r8, r8)
    8424:       fffffc00        .word   0xfffffc00

Looks like the linker substituted 'bl' with 'blx' for the calls to 'memset'
and 'asm_arm' because it noticed that the switch from thumb to arm will be
required. And the call to 'asm_thumb' remained as 'bl' because it is a
thumb->thumb call.

Compiling everything for arm changes the picture:
# gcc -march=armv4t -mthumb-interwork -marm -O2 -o test test.c test_asm.S
# objdump -d test

000083fc <f>:
    83fc:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
    8400:       e24ddb01        sub     sp, sp, #1024   ; 0x400
    8404:       e24dd004        sub     sp, sp, #4      ; 0x4
    8408:       e3a01000        mov     r1, #0  ; 0x0
    840c:       e3a02b01        mov     r2, #1024       ; 0x400
    8410:       e1a0000d        mov     r0, sp
    8414:       ebffffc5        bl      8330 <_init+0x4c>
    8418:       eb00000c        bl      8450 <asm_arm>
    841c:       fa00000a        blx     844c <asm_thumb>
    8420:       e28dd004        add     sp, sp, #4      ; 0x4
    8424:       e28ddb01        add     sp, sp, #1024   ; 0x400
    8428:       e49de004        pop     {lr}            ; (ldr lr, [sp], #4)
    842c:       e12fff1e        bx      lr

Now only a call to 'asm_thumb' function is using blx here.

So finally the question: who is guilty and what to do now?

According to aaelf.pdf (sorry, no direct link, because it is constantly
migrating on arm.com website and I got tired tracking it down, but it's
quite easy to find its copies in google on thirdparty websites) contains
the following text:
"R_ARM_PC24 is used to relocate an ARM B or BL instruction (and on ARMv5 an
ARM BLX instruction). Bits 0-23 encode a signed offset, in units of 4-byte 
instructions (thus 24 bits encode a branch offset of +/- 2 bytes). For a
BLX instruction bit 24 additionally encodes the appropriate half-word address 
of the destination and there is an implicit transition to Thumb state. A 
static linker may convert a BL to a BLX instruction (or vice-versa) if
generating an image for ARMv5 or later. If it is unable to do this (as is the
case for B, or BL<cond> or on ARMv4T) then it must generate a suitable
sequence of instructions that will perform the transition to the target. The
instruction sequence may make use of the intra-procedure scratch register (IP)
and does not need to preserve its value. The relocation must then be
recalculated using the address of the sequence instead of S. Compensation for
the PC bias (8 bytes) must be factored into the relocation expression by the
object producer.

R_ARM_THM_PC22 is used to relocate Thumb BL (and on ARMv5 Thumb BLX) 
instructions. It is thumb equivalent of R_ARM_PC24 and the same rules on
conversion apply. Bits 0-10 of the first half-word encode the most
significant bits of the branch offset, bits 0-10 of the second half-word
encode the least significant bits and the offset is in units of half-words.
Thus 22 bits encode a branch offset of +/- 2 bytes. Compensation for the PC
bias (4 bytes) must be factored into the relocation expression by the object

So when generating binaries for ARMv5, the linker is permitted to
do 'bl' -> 'blx' conversion. That's what we actually see here, except that
we actually want this code to also run on ARMv4. In order to make the code
ARMv4 compatible, the linker had to replace 'bl' instructions with proper 'bl'
instructions doing a call to a small thunk function which would perform 'blx'
emulation and do proper arm->thumb call.

So the conclusion is: the linker currently fails to support proper arm-thumb
interworking on armv4t processors and emits 'blx' instruction which is only
supported on armv5. Anyone trying to mix arm and thumb on armv4t is in danger.
Linking pixman-arm-simd.o file, which contains arm code, provokes the linker
to do these bad things. Unless this bug is already known, it needs to be
reported to binutils.

Best regards,
Siarhei Siamashka

More information about the Pixman mailing list