[Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

Wed Apr 15 02:39:54 PDT 2015

On Tue, 14 Apr 2015 19:00:58 +0100
"Ben Avison" <bavison at riscosopen.org> wrote:

> On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > So the matrices are indexed as matrix[row][column] in Pixman?
> 
> That's my understanding, yes.
> 
> > A short'ish comment somewhere to say why you are doing this offsetting
> > would be nice, and that the offsetting is the reason to allocate a
> > margin.
> 
> Since you've been reworking the patches anyway, do you have enough
> information to add the comments yourself or do you want me to do them?

This and the empty lines I can also add them myself I think, if there
is a need for me to revise the patch.

> >> One simple way round this is to apply a 0.5 pixel translation in the
> >> transformation matrix, so that the pattern becomes:
> >>
> >> d[0] =                1.00 * s[0]
> >> d[1] =                0.50 * s[0] + 0.50 * s[1]
> >> d[2] =                              1.00 * s[1]
> >> d[3] =                              0.50 * s[1] + 0.50 * s[2]
> >>
> >> but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't
> >> see why the fudge factor is needed at all, at least not in the affine
> >> case; it's described as being to compensate for rounding errors, but
> >> there is no rounding involved in fixed-point affine transforms.
> >
> > Maybe the rounding refers to the pixman_fixed_to_int() calls? I could
> > imagine it is to guarantee that an xmin=0.5 gets really rounded to 0,
> > and xmax=0.5 gets rounded to 1.
> 
> There's a slightly better worked example in the later patch I was
> thinking of:
> 
> http://lists.freedesktop.org/archives/pixman/2014-September/003380.html
> 
> As I say, we're getting a bit ahead of ourselves here, as there were
> about 30 patches between the ones we're reworking at the moment and that
> one. Søren did give some comments on it at the end of his review here:
> 
> http://lists.freedesktop.org/archives/pixman/2014-October/003457.html
> 
> which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond
> the bitmap bounds if a fast path happened to over-read pixels and then
> multiply them by a 0 filter coefficient. I think we can probably cater
> for that better by checking whether a bitmap starts or ends at a page
> boundary, and whether we're reading the first and last pixels of the
> image if it does.

I would hesitate to add logic along the lines of "reading beyond array
bounds is ok if it doesn't cross page boundaries". It might be true,
but it could produce hard-to-understand false-positives in Valgrind,
for example, which tracks every byte individually. It's also fairly
surprising and unexpected condition for choosing different code paths,
IMHO, since there is no hardware reason to do it.

But, I would be happy to enhance the test suite to test for bad
accesses better. The test suite is something I can well work on.

> > To be honest, both cases you describe sound strange to me. Surely if I
> > use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect
> > d[0] = 0.5 * s[0] + 0.5 * s[1]
> > when assuming the box filter (if I got my terminology right)...
> 
> You're forgetting it's pixel-centres that are fed through the transform.
> To be honest, I think it's already quite a headache for an application to
> work out how to set up the transform in order to hit a "cover" scaled
> fast path. Assume the reasonable case that you want to plot the whole of
> an image of size x,y at a size m,n. You need to set the diagonals of the
> transform to
> 
> floor(pixman_fixed_1*(x-1)/(m-1))
> floor(pixman_fixed_1*(y-1)/(n-1))
> 1
> 
> and then solve for
> 
>      / 0.5 \   / 0.5 \
> T . | 0.5 | = | 0.5 |
>      \ 1   /   \ 1   /
> 
> to find the translation coefficients that will ensure the top-left source
> pixel aligns exactly to the top-left destination pixel.

Urgh, I don't think we account for *that* in e.g. Weston's Pixman
renderer. I could bet Weston follows the GL convention of defining the
extents of an image region to be the boundaries of the image, not the
pixel centers which would leave a half pixel margin outside of the
region. I mean, Weston works as if you are transforming geometric
primitives like triangles, and rasterization happens later, while it
still uses pixel centers at .5 coordinates, meaning that a single pixel
covers the X-coordinates [0.0, 1.0).

Are you saying that this idea doesn't work with Pixman?

Hmm, sure, Pixman needs to transform pixel centers internally, since it
rasterizes on the destination image... so what
compute_transformed_extents() does is that it assumes the input is
geometry, converts corner pixels into pixel centers for rasterization,
and returns the bounding box of the transformed pixel centers.
Then that is used to calculate what pixels would be accessed in the
source image.

Did I get that right?

> To then require that the caller also knows that you need to nudge things
> along by 8*pixman_fixed_e as well feels like it's requiring too much
> knowledge of Pixman's internals to me. I didn't really like putting it in
> affine-bench to begin with for this reason, but it doesn't work without
> it. So I made do with removing it as quickly as I could - the original
> intention was for it to be in the same patch series, but obviously the
> patch series has grown too large and unwieldy for that the be the case
> any more!

Well, adding the affine benchmarker is one thing. We should deal with
this coordinate problem as a separate issue. It's still good to know
why it initially looks the way it does.

Btw. are you sure there cannot be rounding in the hand-written fast
paths? If some assembly path computes an approximationof source
coordinates, could it not hit the "wrong" pixel? Zero coefficient or
not. I think I'm getting the original intent of the arbitrary 8e
enlargement, but you have found out that it is actually harmful in some
cases.

> When I've seen timing breakdowns of which Pixman operations are being hit
> by applications, I've seen far more scaled plots with repeat type PAD
> than I'd really expect. My suspicions are that the bulk of these are down
> to how difficult it is for applications to set up the transform for
> maximum efficiency.

I can believe that theory, yes. We should try and find some, and record
the full set of parameters to see if that's really the case.

You know, this might very well be the root cause for all those funny
PAD type repeats that Søren was asking why would anyone in their right
mind use such things. If we can reliably fix the coordinate / coverage
issue, maybe we wouldn't need the PAD optimizations after all?

But, that's all in the future, after the easier stuff.

> > I think that part I understood. My comment was about having a {} block
> > without any flow control statements for it. You are just using it to
> > scope some variables, which suggests that the block should actually be
> > a separate function.
> 
> Oh right. Well, I think it was mainly so the variable declarations and
> their associated comments could be close to where they were used without
> requiring C99. I don't particularly care whether it's moved to a separate
> function, I expect most compilers would just inline it again anyway.

Indeed, hence I'd suggest making it a function.

Anyway, I can do the refactorings myself, too, after we land the lookup
patches.

Thanks,
pq