[Pixman] [PATCH 0/7] lowlevel-blt-bench test pattern parser

Mon Apr 13 10:42:45 PDT 2015

On Mon, 13 Apr 2015 12:31:35 +0100, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> Apart from restructuring, there is one significant(?) difference to
> Ben's patches: for a solid mask, Ben used a8r8g8b8, but my code uses a8.

I had to remind myself how things hang together to check if this is
significant or not...

When lowlevel-blt-bench tests a solid image (source or mask), it achieves
it not by calling pixman_image_create_solid_fill() but by creating a 1x1
pixel bitmap using pixman_image_create_bits(). However, the effect is
effectively the same because compute_image_info() detects single-pixel
bitmaps and records it in image->common.extended_format_code before we
start doing fast path lookup. However, image->type remains as BITS rather
than SOLID because the non-common parts of the pixman_image union retain
their original meaning - for a native solid fill image, it's just the
colour in three different formats, but for bitmap images it's a pointer
to and sizes of the bitmap and colour LUT (if applicable), various helper
function pointers and so on.

A typical ARM assembly fast path is written to assume that any argument
corresponding to a solid colour is already in an 8888 format where the
red-blue ordering is the same as the destination image. The magic that
ensures this is hidden in the wrapper C function generated by
PIXMAN_ARM_BIND_FAST_PATH_N_DST and similar, where it calls
_pixman_image_get_solid(). You'll see similar calls more explicitly in
pixman-fast-path.c (architecture-neutral fast paths) and the equivalent
sources files for other architectures.

There are a few circumstances where you might not want to use
_pixman_image_get_solid()  - perhaps most obviously are the operations
which use a solid image in pixel format a2r10g10b10 because you'd lose 2
bits of precision per colour component. There are a few example of those
amongst those you picked out in special_patterns[].

It's worth noting that _pixman_image_get_solid() has optimised code to
extract the colour from image->type==BITS images if the source/mask image
is of format a8r8g8b8, x8r8g8b8 or a8. Other formats will still work, but
more laboriously.

Not that any of this should matter at all for the purposes of
lowlevel-blt-bench. _pixman_image_get_solid() is only called once per
call of pixman_image_composite(), so it is part of the overhead that
should be being accounted for by the code wrapped in #if EXCLUDE_OVERHEAD.

> should we also add a condition, that if a test has CA flag and a solid
> mask, then that mask needs to be a8r8g8b8?

That might be desirable, if only because lowlevel-blt-bench initialises
all its images using memset(0xCC) so an a8 solid image would be converted
by _pixman_image_get_solid() to 0xCC000000 whereas an a8r8g8b8 would be
0xCCCCCCCC. When you're not in component alpha mode, only the alpha byte
matters for the mask image, but in the case of component alpha
operations, a fast path might decide that it can save itself a lot of
multiplications if it spots that 3 constant mask components are already 0.

> None of the existing tests has a solid mask for CA tests. I'm not sure
> how much it makes sense.

When you're in component alpha mode, mathematically the source and mask
images are interchangeable. So you could implement over_8888_n_8888_ca by
calling over_n_8888_8888_ca, and we already have a number of fast paths
for operations of the latter form. (I'm not aware that Pixman is - yet -
aware that it can make such substitutions, but I'm sure it could be
added.)

> Ben, when we eventually get to the actual optimization patches, I expect
> I will reproduce the benchmarks myself while reviewing. Is that ok?

That's definitely fine by me, thanks for offering. Fingers crossed it
doesn't substantially change the results :-)

> What do you think of this series?

In general, it looks like an improvement to me. I don't think the test
programs have had this much attention in a while! One minor point in
patch 6's log message, you copied my typo:

   <operation>_<src>[_<mask]<dst>[_ca]

should be

   <operation>_<src>[_<mask]_<dst>[_ca]

Ben