[Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

Tue Apr 14 11:00:58 PDT 2015

On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> So the matrices are indexed as matrix[row][column] in Pixman?

That's my understanding, yes.

> A short'ish comment somewhere to say why you are doing this offsetting
> would be nice, and that the offsetting is the reason to allocate a
> margin.

Since you've been reworking the patches anyway, do you have enough
information to add the comments yourself or do you want me to do them?

>> One simple way round this is to apply a 0.5 pixel translation in the
>> transformation matrix, so that the pattern becomes:
>>
>> d[0] =                1.00 * s[0]
>> d[1] =                0.50 * s[0] + 0.50 * s[1]
>> d[2] =                              1.00 * s[1]
>> d[3] =                              0.50 * s[1] + 0.50 * s[2]
>>
>> but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't
>> see why the fudge factor is needed at all, at least not in the affine
>> case; it's described as being to compensate for rounding errors, but
>> there is no rounding involved in fixed-point affine transforms.
>
> Maybe the rounding refers to the pixman_fixed_to_int() calls? I could
> imagine it is to guarantee that an xmin=0.5 gets really rounded to 0,
> and xmax=0.5 gets rounded to 1.

There's a slightly better worked example in the later patch I was
thinking of:

http://lists.freedesktop.org/archives/pixman/2014-September/003380.html

As I say, we're getting a bit ahead of ourselves here, as there were
about 30 patches between the ones we're reworking at the moment and that
one. Søren did give some comments on it at the end of his review here:

http://lists.freedesktop.org/archives/pixman/2014-October/003457.html

which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond
the bitmap bounds if a fast path happened to over-read pixels and then
multiply them by a 0 filter coefficient. I think we can probably cater
for that better by checking whether a bitmap starts or ends at a page
boundary, and whether we're reading the first and last pixels of the
image if it does.

> To be honest, both cases you describe sound strange to me. Surely if I
> use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect
> d[0] = 0.5 * s[0] + 0.5 * s[1]
> when assuming the box filter (if I got my terminology right)...

You're forgetting it's pixel-centres that are fed through the transform.
To be honest, I think it's already quite a headache for an application to
work out how to set up the transform in order to hit a "cover" scaled
fast path. Assume the reasonable case that you want to plot the whole of
an image of size x,y at a size m,n. You need to set the diagonals of the
transform to

floor(pixman_fixed_1*(x-1)/(m-1))
floor(pixman_fixed_1*(y-1)/(n-1))
1

and then solve for

     / 0.5 \   / 0.5 \
T . | 0.5 | = | 0.5 |
     \ 1   /   \ 1   /

to find the translation coefficients that will ensure the top-left source
pixel aligns exactly to the top-left destination pixel.

To then require that the caller also knows that you need to nudge things
along by 8*pixman_fixed_e as well feels like it's requiring too much
knowledge of Pixman's internals to me. I didn't really like putting it in
affine-bench to begin with for this reason, but it doesn't work without
it. So I made do with removing it as quickly as I could - the original
intention was for it to be in the same patch series, but obviously the
patch series has grown too large and unwieldy for that the be the case
any more!

When I've seen timing breakdowns of which Pixman operations are being hit
by applications, I've seen far more scaled plots with repeat type PAD
than I'd really expect. My suspicions are that the bulk of these are down
to how difficult it is for applications to set up the transform for
maximum efficiency.

>> I think my hope was that since we're processing images of size 1920 *
>> 1080 pixels, you wouldn't have any particular pixels persisting in the
>> cache from the previous iteration (unless perhaps if you're testing a
>> high-factor enlargement on a platform that has caches that don't
>> allocate on write).
>
> Right, so is there any need to flush_cache() at all?

Possibly not. I was taking my cue from lowlevel-blt-bench again, where it
flushes the cache between each type of test. And looking at the
implementation of flush_cache() again now, I note that due to the action
of allocate-on-write and physically mapped caches, it won't completely
flush most caches, only one page-size worth! Perhaps just delete it then.

>> >> +        bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, UINT32_MAX, 5000000, &n, &t1, pixman_image_composite_wrapper);
>> >> +        bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, n, UINT32_MAX, NULL, &t2, pixman_image_composite_empty);
[...]
> I think that part I understood. My comment was about having a {} block
> without any flow control statements for it. You are just using it to
> scope some variables, which suggests that the block should actually be
> a separate function.

Oh right. Well, I think it was mainly so the variable declarations and
their associated comments could be close to where they were used without
requiring C99. I don't particularly care whether it's moved to a separate
function, I expect most compilers would just inline it again anyway.

Ben