[Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

Jason Ekstrand jason at jlekstrand.net
Fri Sep 12 09:07:06 PDT 2014


Forgot to reply-all.
On Sep 12, 2014 9:05 AM, "Jason Ekstrand" <jason at jlekstrand.net> wrote:

> The teximage-colors test that I pushed to piglit a week or two ago takes a
> --benchmark flag that bumps the texture size and does the upload 1000 times
> and gives you the average time to upload.
> --Jason
> On Sep 12, 2014 9:01 AM, "Brian Paul" <brianp at vmware.com> wrote:
>
>> On 09/12/2014 08:49 AM, Jason Ekstrand wrote:
>>
>>>
>>> On Sep 12, 2014 7:09 AM, "Brian Paul" <brianp at vmware.com
>>> <mailto:brianp at vmware.com>> wrote:
>>>  >
>>>  > On 09/11/2014 04:58 PM, Jason Ekstrand wrote:
>>>  >>
>>>  >>
>>>  >>
>>>  >> On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <Dieter at nuetzel-hh.de
>>> <mailto:Dieter at nuetzel-hh.de>
>>>  >> <mailto:Dieter at nuetzel-hh.de <mailto:Dieter at nuetzel-hh.de>>> wrote:
>>>  >>
>>>  >>     Am 12.09.2014 00:31, schrieb Jason Ekstrand:
>>>  >>
>>>  >>         On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel
>>>  >>         <Dieter at nuetzel-hh.de <mailto:Dieter at nuetzel-hh.de>
>>> <mailto:Dieter at nuetzel-hh.de <mailto:Dieter at nuetzel-hh.de>>>
>>>  >>
>>>  >>         wrote:
>>>  >>
>>>  >>             Am 15.08.2014 04:50, schrieb Jason Ekstrand:
>>>  >>
>>>  >>                 On Aug 14, 2014 7:13 PM, "Dieter Nützel"
>>>  >>                 <Dieter at nuetzel-hh.de <mailto:Dieter at nuetzel-hh.de>
>>> <mailto:Dieter at nuetzel-hh.de <mailto:Dieter at nuetzel-hh.de>>>
>>>  >>
>>>  >>                 wrote:
>>>  >>
>>>  >>
>>>  >>                     Am 15.08.2014 02:36, schrieb Dave Airlie:
>>>  >>
>>>  >>                                 On 08/02/2014 02:11 PM, Jason
>>> Ekstrand
>>>  >>                                 wrote:
>>>  >>
>>>  >>
>>>  >>
>>>  >>                                     Most format conversion operations
>>>  >>                                     required by GL can be
>>>  >>
>>>  >>                 performed by
>>>  >>
>>>  >>                                     converting one channel at a time,
>>>  >>                                     shuffling the channels
>>>  >>
>>>  >>                 around, and
>>>  >>
>>>  >>                                     optionally filling missing
>>> channels
>>>  >>                                     with zeros and ones.
>>>  >>
>>>  >>                 This
>>>  >>                 adds a
>>>  >>
>>>  >>                                     function to do just that in a
>>>  >>                                     general, yet efficient, way.
>>>  >>
>>>  >>                                     v2:
>>>  >>                                     * Add better comments including
>>> full
>>>  >>                                     docs for functions
>>>  >>                                     * Don't use __typeof__
>>>  >>                                     * Use inline helpers instead of
>>>  >>                                     writing out conversions
>>>  >>
>>>  >>                 by
>>>  >>                 hand,
>>>  >>
>>>  >>                                     * Force full loop unrolling for
>>>  >>                                     better performance
>>>  >>
>>>  >>
>>>  >>
>>>  >>                         This file seems to anger gcc a lot.
>>>  >>
>>>  >>                         It seems to take upwards of a minute or two
>>> to
>>>  >>                         compile here.
>>>  >>
>>>  >>                         gcc 4.8.3 on 32-bit x86.
>>>  >>
>>>  >>                         Dave.
>>>  >>
>>>  >>
>>>  >>
>>>  >>                     For me (on our poor little Duron 1800/2 GB) it
>>> ran ~5
>>>  >>
>>>  >>                 minutes...
>>>  >>
>>>  >>
>>>  >>                     gcc 4.8.1 on 32-bit x86.
>>>  >>
>>>  >>
>>>  >>                 If we'd like, the way the macros are set up, it
>>> would be
>>>  >>                 easy to
>>>  >>                 change it so that we do less unrolling in the cases
>>>  >>                 where we are
>>>  >>                 actually doing substantial format conversion and
>>>  >>                 wouldn't notice
>>>  >>                 the
>>>  >>                 extra logic quite as much. I'll play with it a bit
>>>  >>                 tomorrow or
>>>  >>                 next
>>>  >>                 week and see how how much of a hit we would actually
>>>  >>                 take if we
>>>  >>                 unrolled a little less in places.
>>>  >>                 --Jason Ekstrand
>>>  >>
>>>  >>
>>>  >>             Ping.
>>>  >>
>>>  >>             In a second it took 11+ minutes , here...
>>>  >>
>>>  >>
>>>  >>         11 minutes! What system are you running?  and are you using
>>> -03 or
>>>  >>         something?  Yes, we can do something to cut it down, but it
>>> will
>>>  >>         probably require a configure flag; the question is what flag.
>>>  >>
>>>  >>         --Jason
>>>  >>
>>>  >>
>>>  >>     See above, the old children's system... ;-)
>>>  >>     -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
>>>  >>     -mfpmath=sse,387 -pipe
>>>  >>
>>>  >>     Bad? - Worked for ages on AthlonMP....8-)
>>>  >>     Maybe it is bad on Duron (the MP thing, much smaller cache and
>>>  >>     better GCC), now.
>>>  >>
>>>  >>     Dieter
>>>  >>
>>>  >>
>>>  >> Yeah, my recommendation would be hacking the macros to not unroll and
>>>  >> keep the patch locally.  If you've got a better idea as to how to
>>>  >> organize the code so the compiler likes it, I'm open as long as we
>>> don't
>>>  >> loose performance.
>>>  >
>>>  >
>>>  > It looks like a release build with MSVC is taking quite a while to
>>> compile this file too (actually at link time when the optimizer kicks
>>> in).
>>>  >
>>>  > But even on my fast Linux system with gcc, the difference in compile
>>> time between -O0 and -O3 is pretty big (2 seconds vs. 1 minute, 3
>>> seconds).
>>>
>>> The unfortunate thing is that I doubt -O3 gains you anything on this
>>> function given how thoroughly things are unrolled. :-(
>>>
>>
>> Do you have a benchmark program to test the speed of this code?  Have you
>> compared -O0 .. -O3?  I'd be very interested in that.
>>
>> -Brian
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140912/11e0445e/attachment-0001.html>


More information about the mesa-dev mailing list