[Pixman] RFC: Pixman benchmark CPU time measurement

Pekka Paalanen ppaalanen at gmail.com
Tue Jun 2 00:32:35 PDT 2015


most pixman performance benchmarks currently rely on gettime() from
- lowlevel-blt-bench
- prng-test
- radial-perf-test
- scaling-bench

Furthermore, affine-bench has its own gettimei() which is essentially
gettime() but with uin32_t instead of double.

gettime (void)
    struct timeval tv;

    gettimeofday (&tv, NULL);
    return (double)((int64_t)tv.tv_sec * 1000000 + tv.tv_usec) / 1000000.;
    return (double)clock() / (double)CLOCKS_PER_SEC;

This definition of gettime() has several potential drawbacks:

1. clock() will wrap around often, the manual page warns that in some
   cases it wraps around every 72 minutes. As the code in Pixman never
   expects a wraparound, this is subtly broken. This is a fallback path
   for systems that do not provide gettimeofday(), so it is rarely used
   if at all.

2. gettimeofday() measures wall-clock time, which might not be the best
   to measure code performance on a CPU, because all other load in the
   system will affect the result. It's probably not a significant
   problem on fast systems where you know to run your benchmarks

3. gettimeofday() is not only subject to NTP adjustments but is also
   affected by setting the system clock. IOW, this is not a guaranteed
   monotonic clock. Again, unlikely to be a problem in most cases, as
   benchmarks run long enough to even out NTP skews, but short enough to
   avoid accidentally hitting clock changes. (Or so I would assume.)

4. Using double to store an absolute timestamp is suspicious to me. In
   the end, you always compute the difference between two timestamps,
   and using a floating point type may cause catastrophic cancellation
   [1] depending on absolute values. However, [1] also explains that a
   double is enough. But, given that we read an arbitrary system clock
   whose starting point is unknown (ok, Epoch for the moment), we might
   already be getting values too large to maintain the expected
   accuracy (for floats, sure; for doubles, who knows?)

I would propose the following:

- runtime clock selection with this priority order:
	1. clock_gettime(CLOCK_PROCESS_CPUTIME_ID)
	2. getrusage(RUSAGE_SELF) -> rusage.ru_utime (user time)
	3. gettimeofday()
	4. clock()
  Naturally with build time checks, too. For 3 and 4 would print a
  warning about inaccurate measurements. clock_gettime(CLOCK_MONOTONIC)
  is not in the list because I would assume getrusage() is more widely
  available and I'd like to use process time before wall-clock delta.

- A separate void init_gettime(void) for choosing the clock.

- void gettime(struct timespec *ts) for reading the clock.

- double elapsed(const struct timespec *begin, const struct timespec
  *end) for getting the elapsed time in seconds.

In my experiments on the Raspberry Pi 1 [2], it seems to me that
clock_gettime(CLOCK_PROCESS_CPUTIME_ID) is the most accurate CPU time
measurement, getrusage() being possibly rounded to HZ ticks. If neither
is available, I'll just forget about accuracy and get whatever we can
get, essentially what the current gettime() does but without the
floating point difference and the wraparound accounted for (at least
avoid negative elapsed time, which might break algorithms).

What would you think about this scheme?

I am not making a promise to implement this any time soon, I would just
like to hear a few opinions whether it is worth even considering.

This is also like a public note for myself, in case I need to think
about these again. My difficulties with benchmarking Pixman on the Rpi1
is what prompted this, but this wouldn't solve most of the problems I
have there.


[1] https://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/

[2] the test program:
was running over a weekend, generating 78k samples (a big web page!):

More information about the Pixman mailing list