[Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

Fri Sep 12 06:24:21 PDT 2014

Have you considered turning all inline functions into macros, so that
the compiler doesn't have to inline them?

Marek

On Fri, Sep 12, 2014 at 12:58 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>
>
> On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <Dieter at nuetzel-hh.de> wrote:
>>
>> Am 12.09.2014 00:31, schrieb Jason Ekstrand:
>>
>>> On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel <Dieter at nuetzel-hh.de>
>>> wrote:
>>>
>>>> Am 15.08.2014 04:50, schrieb Jason Ekstrand:
>>>>
>>>>> On Aug 14, 2014 7:13 PM, "Dieter Nützel" <Dieter at nuetzel-hh.de>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Am 15.08.2014 02:36, schrieb Dave Airlie:
>>>>>>
>>>>>>>>> On 08/02/2014 02:11 PM, Jason Ekstrand wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Most format conversion operations required by GL can be
>>>>>
>>>>> performed by
>>>>>>>>>>
>>>>>>>>>> converting one channel at a time, shuffling the channels
>>>>>
>>>>> around, and
>>>>>>>>>>
>>>>>>>>>> optionally filling missing channels with zeros and ones.
>>>>>
>>>>> This
>>>>> adds a
>>>>>>>>>>
>>>>>>>>>> function to do just that in a general, yet efficient, way.
>>>>>>>>>>
>>>>>>>>>> v2:
>>>>>>>>>> * Add better comments including full docs for functions
>>>>>>>>>> * Don't use __typeof__
>>>>>>>>>> * Use inline helpers instead of writing out conversions
>>>>>
>>>>> by
>>>>> hand,
>>>>>>>>>>
>>>>>>>>>> * Force full loop unrolling for better performance
>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This file seems to anger gcc a lot.
>>>>>>>
>>>>>>> It seems to take upwards of a minute or two to compile here.
>>>>>>>
>>>>>>> gcc 4.8.3 on 32-bit x86.
>>>>>>>
>>>>>>> Dave.
>>>>>>
>>>>>>
>>>>>>
>>>>>> For me (on our poor little Duron 1800/2 GB) it ran ~5
>>>>>
>>>>> minutes...
>>>>>>
>>>>>>
>>>>>> gcc 4.8.1 on 32-bit x86.
>>>>>
>>>>>
>>>>> If we'd like, the way the macros are set up, it would be easy to
>>>>> change it so that we do less unrolling in the cases where we are
>>>>> actually doing substantial format conversion and wouldn't notice
>>>>> the
>>>>> extra logic quite as much. I'll play with it a bit tomorrow or
>>>>> next
>>>>> week and see how how much of a hit we would actually take if we
>>>>> unrolled a little less in places.
>>>>> --Jason Ekstrand
>>>>
>>>>
>>>> Ping.
>>>>
>>>> In a second it took 11+ minutes , here...
>>>
>>>
>>> 11 minutes! What system are you running?  and are you using -03 or
>>> something?  Yes, we can do something to cut it down, but it will
>>> probably require a configure flag; the question is what flag.
>>>
>>> --Jason
>>
>>
>> See above, the old children's system... ;-)
>> -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
>> -mfpmath=sse,387 -pipe
>>
>> Bad? - Worked for ages on AthlonMP....8-)
>> Maybe it is bad on Duron (the MP thing, much smaller cache and better
>> GCC), now.
>>
>> Dieter
>
>
> Yeah, my recommendation would be hacking the macros to not unroll and keep
> the patch locally.  If you've got a better idea as to how to organize the
> code so the compiler likes it, I'm open as long as we don't loose
> performance.
> --Jason
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>