[Pixman] [PATCH v3] test: Add a new benchmarker targeting affine operations

Fri Apr 24 00:25:05 PDT 2015

On Thu, 23 Apr 2015 20:28:58 +0100
"Ben Avison" <bavison at riscosopen.org> wrote:

> On Thu, 23 Apr 2015 13:10:10 +0100, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> 
> > Affine-bench differs from lowlevel-blt-bench in the following:
> > - does not test different sized operations fitting to specific caches,
> >   destination is always 1920x1080
> > - allows defining the affine transformation parameters
> > - carefully computes operation extents to hit the COVER_CLIP fast paths
> [...]
> > did I capture all the special features affine-bench has over
> > lowlevel-blt-bench? I see llbb could use a transform too, and
> > was looking at why extending that would be unwanted.
> 
> Yes, there's some support in lowlevel-blt-bench for scaled plots, but
> it's limited to a single scale factor - it's the smallest expressable
> increment larger than unity, corresponding to an oh-so-slight size
> reduction, and is applied only in the X axis. The fact that lowlevel-blt
> bench doesn't attempt anything more than that means it can make some
> simplifications:
> 
> * the source and destination buffers can be the same size as for the
>    unscaled case
> * automatically satisfies COVER_CLIP_NEAREST (although when I was
>    analysing the flags yesterday, I was reminded that before I removed the
>    8*pixman_fixed_e fudge factor, lowlevel-blt-bench's bilinear operations
>    were incorrectly calculated to *not* satisfy COVER_CLIP_BILINEAR)
> * no need to add translation offsets into the transform matrix
> * one pixel row in = one pixel row out greatly simplifies the
>    calculations about what can fit in L1 and L2 caches - this is why I
>    deliberately only tested the memory-constrained case in affine-bench.
>    For example, in a 90-degree rotation, each new output pixel in a row
>    will require reading from a different source cacheline.
> 
> When I was writing the ARMv6 scaled fetchers, I became aware that they
> were going to take quite different code paths in the enlargement vs
> reduction cases, as well as when a vertical scaling factor was involved.
> I needed to be able to benchmark all these combinations in order to
> select the best prefetch distances, if nothing else. I also realised
> there was currently no way to benchmark other common affine transforms
> that I might want to address in future such as reflections (including
> those used by the reflect repeat type) or even simple rotations, so with
> the difficulties of ensuring COVER_CLIP_BILINEAR too I just decided it
> would be easier to write a new benchmarker.

Yes, very good.

Also forgot to say, that I didn't bother changing the CLI for
affine-bench to use llbb's pattern parser. I figured it wouldn't be
that important. We already use the same operator and format name lookup
tables anyway.

> I've reviewed your version, looks fine to me. A very minor point: I'm not
> sure it's worth making a copy of the transform struct at the start of
> bench() because we mostly only use a pointer to the struct thereafter, so
> you might as well have kept using bi->transform.

Thank you, I'll take that as R-b.

I make the copy for the sake of the API. I wanted to use const arguments
as much as possible to avoid confusion later on about what values were
actually changed or not. I assumed it wouldn't make any difference to
the benchmark results.

Thanks,
pq