[Pixman] [PATCH 4/8] Better support for NONE repeat in nearest scaling main loop template
siarhei.siamashka at gmail.com
Thu Feb 10 15:23:33 PST 2011
On Thursday 10 February 2011 12:12:41 Soeren Sandmann wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > From: Siarhei Siamashka <siarhei.siamashka at nokia.com>
> > Scaling function now gets an extra boolean argument, which is set
> > to TRUE when we are fetching padding pixels for NONE repeat. This
> > allows to make a decision whether to interpret alpha as 0xFF or 0x00
> > for such pixels when working with formats which don't have alpha
> > channel (for example x8r8g8b8 and r5g6b5).
> Another way to do this would be to clip away zeros up front for source
> images. For example, a new function pointer could be added to the
> image struct:
> clip_zeros (pixman_image_t *image, pixman_region_t *region)
> that would clip away those parts of region where the image is known to
> be zero. Then, in pixman.c, this function could be called for both
> source and mask if the operator was one where zeros had no effect.
> That would benefit all operations and avoid cluttering these
> macros. It would also cause the SAMPLES_COVER_CLIP to be set in more
It is actually not quite another way to do this. There are cases when these
zero padding pixels need to be processed in a different way than skipping.
For example 'src_x888_8888' operation with NONE repeat and sampling of the
pixels outside the source image will need to clear the destination pixels to
zero for that padding area and not just skip them. This extra argument for the
inline function is required to be able to implement such fast path function.
The overall idea of this scaling stuff is the following. We use a somewhat
complex and convoluted template of the main loop (this template is an
internal implementation detail, the developers adding new fast paths ideally
should never have any need to touch it). And in order to get some fully working
fast path function, a reasonably simple scanline processing function needs
to be implemented. This scanline processing function gets a bunch of
arguments, which should be sufficient to implement any fast path from some
practically useful subset of operations (this patch series tries to extend
such subset to cover more cases encountered in the wild).
Anyway, I decided to use this additional boolean argument for the simplicity
sake because in many cases it can be totally ignored (for example, for any
repeat types other than NONE, or when we have alpha channel in the source
image). Also it may be useful to improve performance by not rolling a
scanline processing function over the area of zeros if it can be just
skipped or replaced with memset, but this is not strictly necessary for
correctness or even performance (I would expect that fetching too many
pixels from the padding area is more like an exception than a common case).
Regarding the future plans. I don't quite like inlining too much code into
these templates. The best solution in the future could be some small
inline proxy function which calls the real non-inlined scanline function
with the unused optional arguments omitted. This is actually used for NEON
nearest fast paths already. And after this is done, we may try to actually
call scanline functions via pointers and reuse common templates with the
same 'shape' (same sizes for the source, mask and destination pixel, same
repeat type, ...). This way, for example the main loop for nearest scaled
'over_8888_8888' operation would be exactly the same as the main loop for
'add_8888_8888' operation and code duplication can be avoided (we have the
same main loop function, but just provide it with different scanline
processing function pointers). These scanline processing functions can be
potentially generated by JIT.
I think the nearest scaling is almost feature complete now (in the sense
that the majority of practically useful nearest fast paths can be easily
added for either ARM NEON or SSE2 if such need arises). And I would wait
a bit before trying any refactoring for this code and first see whether it
actually can handle everything we may need. For example, I want these
particular nearest scaling main loop template extensions specifically for
nearest scaled 'over_0565_8_0565' fast path, which is needed for some of the
use cases on ARM hardware. There is a possibility that something might be
> I'm fine with this patch as is too though.
More information about the Pixman