[Spice-devel] [PATCH spice-common v2] codegen: Make the compiler work out better way to write unaligned memory

Christophe Fergeau cfergeau at redhat.com
Wed Sep 20 15:53:08 UTC 2017


On Wed, Sep 20, 2017 at 10:07:51AM -0400, Frediano Ziglio wrote:
> > > 
> > > On Mon, Sep 18, 2017 at 10:27:01AM -0400, Frediano Ziglio wrote:
> > > > > 
> > > > > Has this been tested on actual problematic hardware, and improved
> > > > > performance? I remember last time this was discussed, the outcome was
> > > > > not always what was expected.
> > > > > 
> > > > > Christophe
> > > > > 
> > > > 
> > > > I don't remember old discussions, there were actually 2 different ones
> > > > which were confusing as one was referring an old embedded ARM version
> > > > and another a more recent.
> > > > I remember a memcpy version was proposed.
> > > > I don't own an ARM to test.
> > > > Was tested with x64 with no regressions introduced.
> > > > Knowing the machine instruction, which one fault is causing the issue
> > > > and the different platforms I can try and verify the assembly output
> > > > cross compiling.
> > > 
> > > Given there were past attempts which should have been working, but did
> > > not when testing on real hardware, I'd prefer that we postpone this
> > > patch until someone could run tests on real hardware. This is in an area
> > > of the code which does not change much, so it should not need painful
> > > rebases.
> > > 
> > > Christophe
> > > 
> > 
> > I don't fully agree. I agree code should be tested. On the other side
> > we had 2 reports. One (from Anton D. Kachalov, November 2015) stating an
> > old ARMv5 have problems with unaligned access. Another stating that a more
> > recent 32 bit ARM version has problems with unaligned 64 bit integers.
> > We attempted some experiment asking to Anton to do some tests. Is true
> > we didn't succeeded but is also true that after a couple of attempts
> > Anton stopped doing testing so now we don't know if last proposal (or
> > this one) works. On the other side everybody that can read assembly
> > code and ARM manual can prove that generated code cause unaligned access
> > while with this patch the generated code does not present this issue.
> > 
> > Tried to compile the demarshal code with a cross compiler for ARMv5,
> > the code became:
> > 
> > 000080d4 <parse_msg_disconnecting>:
> >     80d4:       e280300c        add     r3, r0, #12
> >     80d8:       e1530001        cmp     r3, r1
> >     80dc:       8a000025        bhi     8178 <parse_msg_disconnecting+0xa4>
> >     80e0:       e92d41f0        push    {r4, r5, r6, r7, r8, lr}
> >     80e4:       e1a04000        mov     r4, r0
> >     80e8:       e3a00010        mov     r0, #16
> >     80ec:       e1a05002        mov     r5, r2
> >     80f0:       ebfffffe        bl      0 <malloc>
> >     80f4:       e3500000        cmp     r0, #0
> >     80f8:       0a00001c        beq     8170 <parse_msg_disconnecting+0x9c>
> >     80fc:       e3a0c010        mov     ip, #16
> >     8100:       e5d4e001        ldrb    lr, [r4, #1]
> >     8104:       e5d43005        ldrb    r3, [r4, #5]
> >     8108:       e5d42000        ldrb    r2, [r4]
> >     810c:       e5d41004        ldrb    r1, [r4, #4]
> >     8110:       e5d46002        ldrb    r6, [r4, #2]
> >     8114:       e182240e        orr     r2, r2, lr, lsl #8
> >     8118:       e1811403        orr     r1, r1, r3, lsl #8
> >     811c:       e5d4e009        ldrb    lr, [r4, #9]
> >     8120:       e5d43008        ldrb    r3, [r4, #8]
> >     8124:       e1822c16        orr     r2, r2, r6, lsl ip
> >     8128:       e183340e        orr     r3, r3, lr, lsl #8
> >     812c:       e5d46003        ldrb    r6, [r4, #3]
> >     8130:       e5d4e006        ldrb    lr, [r4, #6]
> >     8134:       e5d4700a        ldrb    r7, [r4, #10]
> >     8138:       e1822c06        orr     r2, r2, r6, lsl #24
> >     813c:       e1811c1e        orr     r1, r1, lr, lsl ip
> >     8140:       e5d46007        ldrb    r6, [r4, #7]
> >     8144:       e5d4e00b        ldrb    lr, [r4, #11]
> >     8148:       e1833c17        orr     r3, r3, r7, lsl ip
> >     814c:       e1833c0e        orr     r3, r3, lr, lsl #24
> >     8150:       e1811c06        orr     r1, r1, r6, lsl #24
> >     8154:       e5802000        str     r2, [r0]
> >     8158:       e980000a        stmib   r0, {r1, r3}
> >     815c:       e59f201c        ldr     r2, [pc, #28]   ; 8180
> > <parse_msg_disconnecting+0xac>
> >     8160:       e59d3018        ldr     r3, [sp, #24]
> >     8164:       e583c000        str     ip, [r3]
> >     8168:       e5852000        str     r2, [r5]
> >     816c:       e8bd81f0        pop     {r4, r5, r6, r7, r8, pc}
> >     8170:       e3a00000        mov     r0, #0
> >     8174:       e8bd81f0        pop     {r4, r5, r6, r7, r8, pc}
> >     8178:       e3a00000        mov     r0, #0
> >     817c:       e12fff1e        bx      lr
> >     8180:       00000000        .word   0x00000000
> > 
> > which hardly will trigger an unaligned access (all the read accesses
> > are byte one).
> > 
> > So the end results is that as we cannot test on a real ARMv5 we
> > don't support ARMv6 and ARMv7 properly either.
> > 
> > I would prefer to have the patch tested with a 32 bit ARM and
> > integrated. If somebody came with an ARMv5 (now they are pretty
> > old) we'll fix it.
> > 
> 
> Thanks to David Gilbert (virt team) I manage to test some code
> (modified the test-marshaller code). With the patch I proposed
> the time spent to demarshall a uint8 + uint32 + 2 * uint64
> (and all malloc/free in a loop) changed from 2us to 0.4us.
> 

Ok, let's add that to the commit log then, and
Acked-by: Christophe Fergeau <cfergeau at redhat.com>
Thanks a lot for going so far in the testing!

Christophe


More information about the Spice-devel mailing list