[Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

Fri Sep 12 07:09:19 PDT 2014

On 09/11/2014 04:58 PM, Jason Ekstrand wrote:
>
>
> On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <Dieter at nuetzel-hh.de
> <mailto:Dieter at nuetzel-hh.de>> wrote:
>
>     Am 12.09.2014 00:31, schrieb Jason Ekstrand:
>
>         On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel
>         <Dieter at nuetzel-hh.de <mailto:Dieter at nuetzel-hh.de>>
>         wrote:
>
>             Am 15.08.2014 04:50, schrieb Jason Ekstrand:
>
>                 On Aug 14, 2014 7:13 PM, "Dieter Nützel"
>                 <Dieter at nuetzel-hh.de <mailto:Dieter at nuetzel-hh.de>>
>                 wrote:
>
>
>                     Am 15.08.2014 02:36, schrieb Dave Airlie:
>
>                                 On 08/02/2014 02:11 PM, Jason Ekstrand
>                                 wrote:
>
>
>
>                                     Most format conversion operations
>                                     required by GL can be
>
>                 performed by
>
>                                     converting one channel at a time,
>                                     shuffling the channels
>
>                 around, and
>
>                                     optionally filling missing channels
>                                     with zeros and ones.
>
>                 This
>                 adds a
>
>                                     function to do just that in a
>                                     general, yet efficient, way.
>
>                                     v2:
>                                     * Add better comments including full
>                                     docs for functions
>                                     * Don't use __typeof__
>                                     * Use inline helpers instead of
>                                     writing out conversions
>
>                 by
>                 hand,
>
>                                     * Force full loop unrolling for
>                                     better performance
>
>
>
>                         This file seems to anger gcc a lot.
>
>                         It seems to take upwards of a minute or two to
>                         compile here.
>
>                         gcc 4.8.3 on 32-bit x86.
>
>                         Dave.
>
>
>
>                     For me (on our poor little Duron 1800/2 GB) it ran ~5
>
>                 minutes...
>
>
>                     gcc 4.8.1 on 32-bit x86.
>
>
>                 If we'd like, the way the macros are set up, it would be
>                 easy to
>                 change it so that we do less unrolling in the cases
>                 where we are
>                 actually doing substantial format conversion and
>                 wouldn't notice
>                 the
>                 extra logic quite as much. I'll play with it a bit
>                 tomorrow or
>                 next
>                 week and see how how much of a hit we would actually
>                 take if we
>                 unrolled a little less in places.
>                 --Jason Ekstrand
>
>
>             Ping.
>
>             In a second it took 11+ minutes , here...
>
>
>         11 minutes! What system are you running?  and are you using -03 or
>         something?  Yes, we can do something to cut it down, but it will
>         probably require a configure flag; the question is what flag.
>
>         --Jason
>
>
>     See above, the old children's system... ;-)
>     -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
>     -mfpmath=sse,387 -pipe
>
>     Bad? - Worked for ages on AthlonMP....8-)
>     Maybe it is bad on Duron (the MP thing, much smaller cache and
>     better GCC), now.
>
>     Dieter
>
>
> Yeah, my recommendation would be hacking the macros to not unroll and
> keep the patch locally.  If you've got a better idea as to how to
> organize the code so the compiler likes it, I'm open as long as we don't
> loose performance.

It looks like a release build with MSVC is taking quite a while to 
compile this file too (actually at link time when the optimizer kicks in).

But even on my fast Linux system with gcc, the difference in compile 
time between -O0 and -O3 is pretty big (2 seconds vs. 1 minute, 3 seconds).

I'm still prototyping something but it looks like breaking the top-level 
switch cases in _mesa_swizzle_and_convert() into separate functions 
reduces the time quite a bit.  Let me pursue that a bit further and see 
how it goes...

-Brian