[Mesa-dev] Building with -fno-builtin-memcmp for improved performance
Jose Fonseca
jfonseca at vmware.com
Wed Oct 12 12:44:34 PDT 2011
I've changed the scons to always build with -fno-builtin-memcmp.
Jose
----- Original Message -----
> On Tue, 2011-09-20 at 16:35 +0200, Roland Scheidegger wrote:
> > Am 20.09.2011 16:15, schrieb Keith Whitwell:
> > > On Tue, 2011-09-20 at 16:02 +0200, Roland Scheidegger wrote:
> > >> Am 20.09.2011 12:35, schrieb Keith Whitwell:
> > >>> On Tue, 2011-09-20 at 10:59 +0200, Fabio wrote:
> > >>>> There was a discussion some months ago about using
> > >>>> -fno-builtin-memcmp for
> > >>>> improving memcmp performance:
> > >>>> http://lists.freedesktop.org/archives/mesa-dev/2011-June/009078.html
> > >>>>
> > >>>> Since then, was it properly addressed in mesa or the flag is
> > >>>> still
> > >>>> recommended? If so, what about adding it in configure.ac?
> > >>>
> > >>> I've been meaning to follow up on this too. I don't know the
> > >>> answer,
> > >>> but pinging Roland in case he does.
> > >>
> > >> I guess it is still recommended.
> > >> Ideally this is really something which should be fixed in gcc -
> > >> the
> > >> compiler has all the knowledge about fixed alignment and size
> > >> (if any)
> > >> (and more importantly knows if only a binary answer is needed
> > >> which
> > >> makes this much easier) and doesn't need to do any function
> > >> call.
> > >> If you enable that flag and some platform just has the same
> > >> primitive
> > >> repz cmpsb sequence in the system library it will just get even
> > >> slower,
> > >> though I guess chances of that happening are slim (with the
> > >> possible
> > >> exception of windows).
> > >> I think in most cases it won't make much difference, so nobody
> > >> cared to
> > >> implement that change. It is most likely still a good idea
> > >> unless gcc
> > >> addressed that in the meantime...
> > >
> > > Hmm, it seemed like it made a big difference in the earlier
> > > discussion...
> > Yes for llvmpipe and one app at least.
> > But that struct being compared there is most likely the biggest (by
> > far)
> > anywhere (at least which is compared in a regular fashion).
> >
> > > I should take a look at reducing the size of the struct (as
> > > mentioned
> > > before), but surely there's some way to pull in a better memcmp??
> >
> > Well, apart from using -fno-builtin-memcmp we could build our own
> > memcmpxx, though the version I did there (returning binary only
> > result
> > and assuming 32bit alignment/size allowing gcc to optimize it) was
> > still
> > slower for large sizes than -fno-builtin-memcmp. Of course we could
> > optimize it more (e.g. for 64bit aligned/sized things, or using
> > hand-coded sse2 versions using 128bit at-a-time comparisons) but
> > then it
> > gets more complicated, so I wasn't sure it was worth it.
> >
> > For reference here are the earlier numbers (ipers with llvmpipe):
> > original ipers: 12.1 fps
> > optimized struct compare: 16.8 fps
> > -fno-builtin-memcmp: 18.1 fps
> >
> > And this was the function I used for getting the numbers:
> >
> > static INLINE int util_cmp_struct(const void *src1, const void
> > *src2,
> > unsigned count)
> > {
> > /* hmm pointer casting is evil */
> > const uint32_t *src1_ptr = (uint32_t *)src1;
> > const uint32_t *src2_ptr = (uint32_t *)src2;
> > unsigned i;
> > assert(count % 4 == 0);
> > for (i = 0; i < count/4; i++) {
> > if (*src1_ptr != *src2_ptr) {
> > return 1;
> > }
> > src1_ptr++;
> > src2_ptr++;
> > }
> > return 0;
> > }
>
> OK, maybe the first thing to do is fix the compared struct, then
> let's
> see if there's anything significant left for a better memcmp to
> extract.
>
> I can find some time to do that in the next few days.
>
> Keith
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
More information about the mesa-dev
mailing list