[Mesa-dev] [PATCH] gallium: reflect likely outcome in execution flow

Timothy Arceri t_arceri at yahoo.com.au
Fri May 23 00:19:47 PDT 2014


On Thu, 2014-05-22 at 09:54 -0400, Ilia Mirkin wrote:
> On Thu, May 22, 2014 at 2:56 AM, Timothy Arceri <t_arceri at yahoo.com.au> wrote:
> > On Wed, 2014-05-21 at 18:05 -0400, Ilia Mirkin wrote:
> >
> > On Wed, May 21, 2014 at 6:02 PM, Timothy Arceri <t_arceri at yahoo.com.au>
> > wrote:
> >> Unless we run out of memory the old if statement would always fail so
> >> reflect the more likely outcome. Should be be faster most of the time and
> >> slightly cleaner looking code.
> >>
> >> Signed-off-by: Timothy Arceri <t_arceri at yahoo.com.au>
> >> ---
> >>  src/gallium/auxiliary/util/u_upload_mgr.c | 7 +++----
> >>  1 file changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/src/gallium/auxiliary/util/u_upload_mgr.c
> >> b/src/gallium/auxiliary/util/u_upload_mgr.c
> >> index 744ea2e..99f9a08 100644
> >> --- a/src/gallium/auxiliary/util/u_upload_mgr.c
> >> +++ b/src/gallium/auxiliary/util/u_upload_mgr.c
> >> @@ -247,11 +247,10 @@ enum pipe_error u_upload_data( struct u_upload_mgr
> >> *upload,
> >>     enum pipe_error ret = u_upload_alloc(upload, min_out_offset, size,
> >>                                          out_offset, outbuf,
> >>                                          (void**)&ptr);
> >> -   if (ret != PIPE_OK)
> >> -      return ret;
> >> +   if (ret == PIPE_OK)
> >> +      memcpy(ptr, data, size);
> >>
> >> -   memcpy(ptr, data, size);
> >> -   return PIPE_OK;
> >> +   return ret;
> >>  }
> >
> > Have you actually compared the generated code? I find it can be fairly
> > instructive to do so... btw, there's a likely() (and unlikely() )
> > primitive (in p_compiler.h) which will indicate to gcc whether a
> > particular condition is likely or unlikely, so that it can adjust its
> > code generation accordingly.
> >
> >   -ilia
> >
> >
> > Hi Ilia,
> >
> > I've taken a look at the results (see below) and read up on likely/unlikely
> > (I was aware of these but wasn't confident in when they should be used).
> >
> > The result is my patch avoids a jump which I assume should help with branch
> > prediction in the cpu? Something like this is what I was expecting with the
> > change.
> > The interesting thing is that in this case using likely seems to have added
> > an extra 'test' instruction for no added value (please correct me if I'm
> > wrong). However there is probably other places in u_upload_alloc(), etc
> > where likely might be more helpful.
> >
> > Tim
> >
> > Common code:
> >
> >   444fce: 55                   push   %rbp
> >   444fcf: 48 89 e5             mov    %rsp,%rbp
> >   444fd2: 48 83 ec 40          sub    $0x40,%rsp
> >   444fd6: 48 89 7d e8          mov    %rdi,-0x18(%rbp)
> >   444fda: 89 75 e4             mov    %esi,-0x1c(%rbp)
> >   444fdd: 89 55 e0             mov    %edx,-0x20(%rbp)
> >   444fe0: 48 89 4d d8          mov    %rcx,-0x28(%rbp)
> >   444fe4: 4c 89 45 d0          mov    %r8,-0x30(%rbp)
> >   444fe8: 4c 89 4d c8          mov    %r9,-0x38(%rbp)
> >   444fec: 4c 8d 45 f0          lea    -0x10(%rbp),%r8
> >   444ff0: 48 8b 7d c8          mov    -0x38(%rbp),%rdi
> >   444ff4: 48 8b 4d d0          mov    -0x30(%rbp),%rcx
> >   444ff8: 8b 55 e0             mov    -0x20(%rbp),%edx
> >   444ffb: 8b 75 e4             mov    -0x1c(%rbp),%esi
> >   444ffe: 48 8b 45 e8          mov    -0x18(%rbp),%rax
> >   445002: 4d 89 c1             mov    %r8,%r9
> >   445005: 49 89 f8             mov    %rdi,%r8
> 
> This is some seriously disgusting code...
> 
> 444fe8: 4c 89 4d c8          mov    %r9,-0x38(%rbp)
> 444ff0: 48 8b 7d c8          mov    -0x38(%rbp),%rdi
> 445005: 49 89 f8             mov    %rdi,%r8
> 
> gcc should be able to do better. Do you perhaps have --enable-debug in
> your mesa build? That disables -O2... If that is the case, might want
> to rerun your analysis without --enable-debug.
> 
>   -ilia

Yes I did have --enable-debug set whoops. With -O2 enabled gcc produces
the same output from the current code as my patch does without
optimisations, so looks like there is no performance gains at all. I
also tried using unlikely() in u_upload_alloc() and while this does
produce different code it doesn't seem to have any noticeable impact.
Guess I'll keep looking.

> 
> >   445008: 48 89 c7             mov    %rax,%rdi
> >   44500b: e8 78 fd ff ff       callq  444d88 <u_upload_alloc>
> >   445010: 89 45 fc             mov    %eax,-0x4(%rbp)
> >   445013: 83 7d fc 00          cmpl   $0x0,-0x4(%rbp)
> >
> > Pre change:
> >
> > 0000000000444fce <u_upload_data>:
> >   445017: 74 05                je     44501e <u_upload_data+0x50>
> >   445019: 8b 45 fc             mov    -0x4(%rbp),%eax
> >   44501c: eb 1b                jmp    445039 <u_upload_data+0x6b>
> >   44501e: 8b 55 e0             mov    -0x20(%rbp),%edx
> >   445021: 48 8b 45 f0          mov    -0x10(%rbp),%rax
> >   445025: 48 8b 4d d8          mov    -0x28(%rbp),%rcx
> >   445029: 48 89 ce             mov    %rcx,%rsi
> >   44502c: 48 89 c7             mov    %rax,%rdi
> >   44502f: e8 2c 40 c0 ff       callq  49060 <memcpy at plt>
> >   445034: b8 00 00 00 00       mov    $0x0,%eax
> >   445039: c9                   leaveq
> >   44503a: c3                   retq
> >
> >
> > Post change:
> >
> > 0000000000444fce <u_upload_data>:
> >   445017: 75 16                jne    44502f <u_upload_data+0x61>
> >   445019: 8b 55 e0             mov    -0x20(%rbp),%edx
> >   44501c: 48 8b 45 f0          mov    -0x10(%rbp),%rax
> >   445020: 48 8b 4d d8          mov    -0x28(%rbp),%rcx
> >   445024: 48 89 ce             mov    %rcx,%rsi
> >   445027: 48 89 c7             mov    %rax,%rdi
> >   44502a: e8 31 40 c0 ff       callq  49060 <memcpy at plt>
> >   44502f: 8b 45 fc             mov    -0x4(%rbp),%eax
> >   445032: c9                   leaveq
> >   445033: c3                   retq
> >
> >
> > With likely:
> >
> > 0000000000444fce <u_upload_data>:
> >   44501d: 48 85 c0             test   %rax,%rax
> >   445020: 74 16                je     445038 <u_upload_data+0x6a>
> >   445022: 8b 55 e0             mov    -0x20(%rbp),%edx
> >   445025: 48 8b 45 f0          mov    -0x10(%rbp),%rax
> >   445029: 48 8b 4d d8          mov    -0x28(%rbp),%rcx
> >   44502d: 48 89 ce             mov    %rcx,%rsi
> >   445030: 48 89 c7             mov    %rax,%rdi
> >   445033: e8 28 40 c0 ff       callq  49060 <memcpy at plt>
> >   445038: 8b 45 fc             mov    -0x4(%rbp),%eax
> >   44503b: c9                   leaveq
> >   44503c: c3                   retq
> >




More information about the mesa-dev mailing list