[Pixman] [PATCH 0/4] New fast paths and Raspberry Pi 1 benchmarking

Thu Aug 20 13:16:32 PDT 2015

On Thu, 20 Aug 2015 19:34:37 +0100, Bill Spitzak <spitzak at gmail.com> wrote:

> Could this be whether some "bad" instruction ends up next to or split
> by a cache line boundary? That would produce a random-looking plot,
> though it really is a plot of the location of the bad instructions in
> the measured function.
>
> If this really is a problem then the ideal fix is for the compiler to
> insert NOP instructions in order to move the bad instructions away from
> the locations that make them bad. Yike.

Thought of that, tried it, still baffled at the results. In other words,
merely ensuring instructions retained the same alignment to cachelines
wasn't enough to ensure reproducibility - it could only be achieved by
ensuring the same absolute address (which isn't an option with shared
libraries in the presence of ASLR).

My best theory at the moment is that the branch predictor in the ARM11
uses a hash of both the source and destination addresses of a branch to
choose which index in the predictor cache. Because it's a direct-mapped
cache, any collisions due to the branch moving to a different address can
have major effects on very tight loops like src_8888_8888.

Ben