enhancing perfcheck - Proof of concept & proposals

Mon Oct 27 13:50:31 PDT 2014

Hi,

On Wed, 2014-10-22 at 12:18 +0200, Laurent Godard wrote:
> I instrumented the big file load for testing purpose but yes, in 
> absolute, i'm also interrested in perf check of such files loading (and 
> even saving)

I've added it to loperf; it's now in
http://dev-builds.libreoffice.org/callgrind_report/

So, there is no need for the load test now, I think.

> i agree to keep the instrumentation as slow as possible, but in my 
> experience, some perf problem start to appear exponentially with file 
> complexity (a lot of sheets/formulas/named ranges/cell notes)

heh, hopefully not exponential but 'just' quadratic or something like that :-)
Yes, to see problems, the data have to be resonable big often,
but I think it's a bit different with callgrind - you see everything there.
And with 100 items, you should be already able to spot quadratic complexity.

> > Well, this is good but it's hard to parse the results quickly.
> > Do you think we could have date/commit in one line with all numbers?
> > And descriptions somewhere at the top.
> > So that we could compare results in one column easily (and draw graphs..)
> > Something like
> > http://dev-builds.libreoffice.org/callgrind_report/history.fods
> >
> 
> this is what is intended to be done
> 
> the output is a tabulated separated csv file, with all the information 
> on a single line (and description at top)

yes, it is but I meant something else - current description is not that usefull.
We have

lastCommit test_name filedatetime dump_comment count
----------------------------------------------------
<commit> <test_name> <date1> <dump_comment> <number_1>
... other tests
...
<commit+1> <test_name> <date2> <dump_comment> <number_2>

and it's hard to see how situation has improved between
commit and commit+1 for <test_name>-<dump_comment> test

Instead something like
commit date <test_name-1>-<dump_comment-1> ... <test_name-n>-<dump_comment-m> 
----------------------------------------------------
<commit> <date> <number_1> ... numbers for different tests
<commit+1> <date> <number_2> ...

would be better I think - you can just compare
number_1 and number_2 on the next row in the same column.
Hopefully this makes sense.

The problem is adding new tests - they would need new columns.
The script would need to be clever enough to add new columns when needed somehow.
Or, another possibility is to have another script which would generate
file with the second format mentioned here from the existing csv file.

> > Or - even better - we could just compile in the callgrind code all the time and decide when
> > running make, whether we want to run under valgrind --tool=callgrind or not (or both).
> > If that works. :-)
> > So, something like IS_PERFCHECK is always true, no duplication
> > and only decide whether to run under valgrind.
> >
> > Does that make sense?
> > What do you think?
> >
> 
> i like the approach as it will simplify the trickiest part (the nasty 
> include to avoid double linking problem)

indeed

> imho, it would be clearer to keep some 'make perfcheck' command but this 
> would only
> - set IS_PERFCHECK

this would be set all the time (not needed)
make perfcheck would just run tests under callgrind, where it makes sense

> feel free to give me some code pointers (remember, i'm only a poor 
> scripter, always-beginner in core stuff ;) )

Do you want to work on this?
It's just a matter of adding some include/test/callgrind.hxx
and using that instead of macros with content similar to yours
<https://gerrit.libreoffice.org/gitweb?p=core.git;a=blob;f=sc/qa/perf/perf_instrumentation.cxx;h=2bfea52063b8584ba9dab6110bace230d6899bcd;hb=ac1ef94b035813895a1420e7b1434eeb925dd220>
and some makefiles hacking I guess.
I am happy to hack this in, or help you with implementation - as you choose.

Best,
Matus