[Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code
eero.t.tamminen at intel.com
Wed Oct 19 13:55:35 UTC 2016
On 18.10.2016 20:12, Jan Ziak wrote:
>> Never profile with -O0 or disabled function inlining.
Nobody's going to take seriously optimization results taken from
>> Mesa uses -g -O2
>> with --enable-debug, so that's what you should use too. Don't use any
>> other -O* variants.
> What if I find a case where -O2 prevents me from easily seeing
> information necessary to optimize the source code?
I've never had that as a problem after the ARM call unwinding was solved
in GCC and profiling tools decade ago (Valgrind even has some patches
for that from the work we were doing at my previous employer).
You need to know what compiler optimizations do, to better understand
the shown data, but tools can nowadays e.g. show inlined code correctly.
In general, if you have problems with optimized builds, either your
tools or your builds are broken.
(C++ does make things a bit more difficult because there's *much* more
inlining happening with compiler optimizations on C++ code.
(Rest of the mail is general comments on profiling, not so much aimed
for you or Marek, I assume you both already know that stuff.)
>> The only profiling tools reporting correct results are perf and
Perf uses sampling and reports averages. While perf varies the sampling
rate, sampling can still misrepresent some things (small frequently
called things), and averages aren't good for everything.
That's why one should *also* use something like Valgrind which doesn't
miss things (although it cannot accurately estimate how much time they
take), so that you can see all call chains & call counts.
This isn't about latency, but for that good Intel PT based tool would be
most correct. Like the data provided by ARM ETM interface, it's very
awkward to use though (GBs of data to process, tools not open source etc).
> I used perf on Metro 2033 Redux and saw do_dead_code() there. Then I
> used callgrind to see some more code.
>> (both use the same mechanism) If you don't enable dwarf in
>> perf (also sysprof can't use dwarf), you have to build Mesa with
>> -fno-omit-frame-pointer to see call trees. The only reason you would
>> want to enable dwarf-based call trees is when you want to see libc
>> calls. Otherwise, they won't be displayed or counted as part of call
>> trees. For Mesa developers who do profiling often,
>> -fno-omit-frame-pointer should be your default.
>> Callgrind counts calls (that one you can trust), but the reported time
>> is incorrect,
Callgrind reports number of instructions, not time.
Cachegrind can provide estimates for how much time is taken, but as you
mentioned, it's not very reliable (while one can specify similar cache
sizes as the target machine has, the cache model is inaccurate, and I
don't think it counts SIMD code correctly).
Both report this data only for the user-space process, not for the work
the process requests from the kernel.
More information about the mesa-dev