[poppler] [RFC] Extend regtest framework to track performance

Wed Dec 30 15:21:13 PST 2015

El Wednesday 30 December 2015, a les 20:00:17, Ihar Filipau va escriure:
> On 12/30/15, Albert Astals Cid <aacid at kde.org> wrote:
> > El Wednesday 30 December 2015, a les 17:04:42, Adam Reichold va escriure:
> >> Hello again,
> >> 
> >> as discussed in the code modernization thread, if we are going to make
> >> performance-orient changes, we need a simple way to track functional and
> >> performance regressions.
> >> 
> >> The attached patch tries to extend the existing Python-based regtest
> >> framework to measure run time and memory usage to spot significant
> >> performance changes in the sense of relative deviations w.r.t. to these
> >> two parameters. It also collects the sums of both which might be used as
> >> "ball park" numbers to compare the performance effect of changes over
> >> document collections.
> > 
> > Have you tried it? How stable are the numbers? For example here i get for
> > rendering the same file (discarding the first time that is loading the
> > file
> > into memory) numbers that range from 620ms to 676ms, i.e. ~10% variation
> > without no change at all.
> 
> To make the timing numbers stable, the benchmark framework should
> repeat the test few times. IME at least three time. (I often do as
> many as five runs.)
> 
> The final result is a pair: the average of the timing among all runs,
> and (for example) standard deviation (or simply distance to the
> min/max value) computed over all the timing number.
> 
> {I occasionally test performance on an embedded system running of the
> flash (no spinning disks, no networks, nothing to screw timing) yet I
> still get variations as high as 5%. Performance testing on a PC is
> even trickier business: some go as far as to reboot system in
> single-user mode, and shutdown all unnecessary services. Pretty much
> everything running in the background - and foreground, e.g. GUI - can
> contribute to the unreliability of the numbers.}
> 
> For a benchmark on normal Linux/etc, I would advise to perform the
> test once to "warm up" the caches, and only then start with the
> measured test runs.
> 
> Summary:
> 1. A performance test framework should do a "warm up" phase. Timing is
> discarded.
> 2. A performance test framework should repeat the test 3/5/etc time,
> collecting the timing information.
> 3. The collected timing are averaged and the deviation (or distance to
> min/max) is computed. The average is the official benchmark result,
> the deviation/etc is the indication of the reliability of the
> benchmark.
> 
> fyi.
> 
> P.S. Note that 600ms is an OK-ish duration for a benchmark: not too
> short, not too long.

600 ms is the rendering time one page of one of the 1600 files in one of the 3 
or 4 backends.

;)

> But. Generally, shorter the duration of the
> benchmark, less reliable the timing numbers are (higher the
> deviation). Longer the duration - more reliable numbers are (higher
> the deviation).
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler