[Pixman] [PATCH] SIMD: Try without any CFLAGS before forcing -mcpu=
Siarhei Siamashka
siarhei.siamashka at gmail.com
Fri Mar 19 08:02:44 PDT 2010
On Friday 19 March 2010, Martin Jansa wrote:
> On Fri, Mar 19, 2010 at 03:07:24PM +0200, Siarhei Siamashka wrote:
> [...]
>
> > The whole issue is very strange. This 'pixman_transform_init_identity'
> > function is defined in 'pixman-matrix.c' file. The compiler seems to
> > generate a call to 'memset' using the variant of 'blx' instruction which
> > is invalid for armv4t compatible processors.
> >
> > I have no idea how this all could be related to the presence of absence
> > of arm-simd support. For me gcc-4.4.3 generates the following thumb code
> > for 'pixman_transform_init_identity' function when targeting armv4t
> > (support for arm-simd is also enabled):
> With SIMD configure.ac patch included:
[...]
> and blx is back :/
Looks like I see what is happening.
Let's take the following example:
/********** test.c *************/
#include <string.h>
void f()
{
volatile char buffer[1024];
memset((void *)buffer, 0, 1024);
asm_arm();
asm_thumb();
}
int main()
{
f();
return 0;
}
/*******************************/
/********* test_asm.S **********/
.arch armv4t
.global asm_thumb
.global asm_arm
.thumb
.thumb_func
asm_thumb:
bx lr
.arm
asm_arm:
bx lr
/*******************************/
This assembly file just contains two dummy functions (do nothing and return),
one is using arm instructions, and another one is using thumb.
Compiling test.c alone into an object file:
# gcc -march=armv4t -mthumb-interwork -mthumb -O2 -c test.c
# objdump -d test.o
00000000 <f>:
0: b580 push {r7, lr}
2: 4f09 ldr r7, [pc, #36] (28 <f+0x28>)
4: 2280 movs r2, #128
6: 44bd add sp, r7
8: 00d2 lsls r2, r2, #3
a: 2100 movs r1, #0
c: 4668 mov r0, sp
e: f7ff fffe bl 0 <memset>
12: f7ff fffe bl 0 <asm_arm>
16: f7ff fffe bl 0 <asm_thumb>
1a: 2380 movs r3, #128
1c: 00db lsls r3, r3, #3
1e: 449d add sp, r3
20: bc80 pop {r7}
22: bc01 pop {r0}
24: 4700 bx r0
26: 46c0 nop (mov r8, r8)
28: fffffc00 .word 0xfffffc00
The calls to all three functions show up as 'bl' instructions.
Now let's compile everything together:
# gcc -march=armv4t -mthumb-interwork -mthumb -O2 -o test test.c test_asm.S
# objdump -d test
000083fc <f>:
83fc: b580 push {r7, lr}
83fe: 4f09 ldr r7, [pc, #36] (8424 <f+0x28>)
8400: 2280 movs r2, #128
8402: 44bd add sp, r7
8404: 00d2 lsls r2, r2, #3
8406: 2100 movs r1, #0
8408: 4668 mov r0, sp
840a: f7ff ef92 blx 8330 <_init+0x4c>
840e: f000 e816 blx 843c <asm_arm>
8412: f000 f811 bl 8438 <asm_thumb>
8416: 2380 movs r3, #128
8418: 00db lsls r3, r3, #3
841a: 449d add sp, r3
841c: bc80 pop {r7}
841e: bc01 pop {r0}
8420: 4700 bx r0
8422: 46c0 nop (mov r8, r8)
8424: fffffc00 .word 0xfffffc00
Looks like the linker substituted 'bl' with 'blx' for the calls to 'memset'
and 'asm_arm' because it noticed that the switch from thumb to arm will be
required. And the call to 'asm_thumb' remained as 'bl' because it is a
thumb->thumb call.
Compiling everything for arm changes the picture:
# gcc -march=armv4t -mthumb-interwork -marm -O2 -o test test.c test_asm.S
# objdump -d test
000083fc <f>:
83fc: e52de004 push {lr} ; (str lr, [sp, #-4]!)
8400: e24ddb01 sub sp, sp, #1024 ; 0x400
8404: e24dd004 sub sp, sp, #4 ; 0x4
8408: e3a01000 mov r1, #0 ; 0x0
840c: e3a02b01 mov r2, #1024 ; 0x400
8410: e1a0000d mov r0, sp
8414: ebffffc5 bl 8330 <_init+0x4c>
8418: eb00000c bl 8450 <asm_arm>
841c: fa00000a blx 844c <asm_thumb>
8420: e28dd004 add sp, sp, #4 ; 0x4
8424: e28ddb01 add sp, sp, #1024 ; 0x400
8428: e49de004 pop {lr} ; (ldr lr, [sp], #4)
842c: e12fff1e bx lr
Now only a call to 'asm_thumb' function is using blx here.
So finally the question: who is guilty and what to do now?
According to aaelf.pdf (sorry, no direct link, because it is constantly
migrating on arm.com website and I got tired tracking it down, but it's
quite easy to find its copies in google on thirdparty websites) contains
the following text:
"R_ARM_PC24 is used to relocate an ARM B or BL instruction (and on ARMv5 an
ARM BLX instruction). Bits 0-23 encode a signed offset, in units of 4-byte
instructions (thus 24 bits encode a branch offset of +/- 2 bytes). For a
BLX instruction bit 24 additionally encodes the appropriate half-word address
of the destination and there is an implicit transition to Thumb state. A
static linker may convert a BL to a BLX instruction (or vice-versa) if
generating an image for ARMv5 or later. If it is unable to do this (as is the
case for B, or BL<cond> or on ARMv4T) then it must generate a suitable
sequence of instructions that will perform the transition to the target. The
instruction sequence may make use of the intra-procedure scratch register (IP)
and does not need to preserve its value. The relocation must then be
recalculated using the address of the sequence instead of S. Compensation for
the PC bias (8 bytes) must be factored into the relocation expression by the
object producer.
R_ARM_THM_PC22 is used to relocate Thumb BL (and on ARMv5 Thumb BLX)
instructions. It is thumb equivalent of R_ARM_PC24 and the same rules on
conversion apply. Bits 0-10 of the first half-word encode the most
significant bits of the branch offset, bits 0-10 of the second half-word
encode the least significant bits and the offset is in units of half-words.
Thus 22 bits encode a branch offset of +/- 2 bytes. Compensation for the PC
bias (4 bytes) must be factored into the relocation expression by the object
producer."
So when generating binaries for ARMv5, the linker is permitted to
do 'bl' -> 'blx' conversion. That's what we actually see here, except that
we actually want this code to also run on ARMv4. In order to make the code
ARMv4 compatible, the linker had to replace 'bl' instructions with proper 'bl'
instructions doing a call to a small thunk function which would perform 'blx'
emulation and do proper arm->thumb call.
So the conclusion is: the linker currently fails to support proper arm-thumb
interworking on armv4t processors and emits 'blx' instruction which is only
supported on armv5. Anyone trying to mix arm and thumb on armv4t is in danger.
Linking pixman-arm-simd.o file, which contains arm code, provokes the linker
to do these bad things. Unless this bug is already known, it needs to be
reported to binutils.
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list