Forgot to reply-all. <div class="gmail_quote">On Sep 12, 2014 9:05 AM, "Jason Ekstrand" <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The teximage-colors test that I pushed to piglit a week or two ago takes a --benchmark flag that bumps the texture size and does the upload 1000 times and gives you the average time to upload. --Jason <div class="gmail_quote">On Sep 12, 2014 9:01 AM, "Brian Paul" <<a href="mailto:brianp@vmware.com" target="_blank">brianp@vmware.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 09/12/2014 08:49 AM, Jason Ekstrand wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On Sep 12, 2014 7:09 AM, "Brian Paul" <<a href="mailto:brianp@vmware.com" target="_blank">brianp@vmware.com</a> <mailto:<a href="mailto:brianp@vmware.com" target="_blank">brianp@vmware.com</a>>> wrote: > > On 09/11/2014 04:58 PM, Jason Ekstrand wrote: >> >> >> >> On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a>> >> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a>>>> wrote: >> >> Am 12.09.2014 00:31, schrieb Jason Ekstrand: >> >> On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel >> <<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a>> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a>>>> >> >> wrote: >> >> Am 15.08.2014 04:50, schrieb Jason Ekstrand: >> >> On Aug 14, 2014 7:13 PM, "Dieter Nützel" >> <<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a>> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a> <mailto:<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a>>>> >> >> wrote: >> >> >> Am 15.08.2014 02:36, schrieb Dave Airlie: >> >> On 08/02/2014 02:11 PM, Jason Ekstrand >> wrote: >> >> >> >> Most format conversion operations >> required by GL can be >> >> performed by >> >> converting one channel at a time, >> shuffling the channels >> >> around, and >> >> optionally filling missing channels >> with zeros and ones. >> >> This >> adds a >> >> function to do just that in a >> general, yet efficient, way. >> >> v2: >> * Add better comments including full >> docs for functions >> * Don't use __typeof__ >> * Use inline helpers instead of >> writing out conversions >> >> by >> hand, >> >> * Force full loop unrolling for >> better performance >> >> >> >> This file seems to anger gcc a lot. >> >> It seems to take upwards of a minute or two to >> compile here. >> >> gcc 4.8.3 on 32-bit x86. >> >> Dave. >> >> >> >> For me (on our poor little Duron 1800/2 GB) it ran ~5 >> >> minutes... >> >> >> gcc 4.8.1 on 32-bit x86. >> >> >> If we'd like, the way the macros are set up, it would be >> easy to >> change it so that we do less unrolling in the cases >> where we are >> actually doing substantial format conversion and >> wouldn't notice >> the >> extra logic quite as much. I'll play with it a bit >> tomorrow or >> next >> week and see how how much of a hit we would actually >> take if we >> unrolled a little less in places. >> --Jason Ekstrand >> >> >> Ping. >> >> In a second it took 11+ minutes , here... >> >> >> 11 minutes! What system are you running? and are you using -03 or >> something? Yes, we can do something to cut it down, but it will >> probably require a configure flag; the question is what flag. >> >> --Jason >> >> >> See above, the old children's system... ;-) >> -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx >> -mfpmath=sse,387 -pipe >> >> Bad? - Worked for ages on AthlonMP....8-) >> Maybe it is bad on Duron (the MP thing, much smaller cache and >> better GCC), now. >> >> Dieter >> >> >> Yeah, my recommendation would be hacking the macros to not unroll and >> keep the patch locally. If you've got a better idea as to how to >> organize the code so the compiler likes it, I'm open as long as we don't >> loose performance. > > > It looks like a release build with MSVC is taking quite a while to compile this file too (actually at link time when the optimizer kicks in). > > But even on my fast Linux system with gcc, the difference in compile time between -O0 and -O3 is pretty big (2 seconds vs. 1 minute, 3 seconds). The unfortunate thing is that I doubt -O3 gains you anything on this function given how thoroughly things are unrolled. :-( </blockquote> Do you have a benchmark program to test the speed of this code? Have you compared -O0 .. -O3? I'd be very interested in that. -Brian </blockquote></div> </blockquote></div>