[Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)

Ben Avison bavison at riscosopen.org
Tue Aug 25 07:02:06 PDT 2015

On Tue, 25 Aug 2015 13:45:48 +0100, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>> [exposing general_composite_rect]
>> I can't say that any cleaner solution has occurred to me since then.
> I think the more immediate solution, as Soren have suggested on IRC,
> is for me to implement the equivalent fast-path in VMX.
> I see that it is already implemented in mmx, sse2, mips-dspr2 and
> arm-neon. From looking at the C code, I'm guessing that it is fairly
> simple to implement.

Yes, it's definitely one of the simpler fast paths, with only two
channels to worry about (source and destination) and with one of them
being a constant. I wrote an arm-simd version as well, to add to your
list - it's just that it's still waiting to be committed :)

I probably ought to get round to exposing general_composite_rect sooner
rather than later anyway - it's one of the few things from my mammoth
patch series last year that Søren commented on and which I haven't got
round to revising yet.

>> I just had a quick look at the VMX source file, and it has hardly any
>> iters defined. My guess would be that what's being used is
>> noop_init_solid_narrow() from pixman-noop.c
>> _pixman_iter_get_scanline_noop() from pixman-utils.c
>> combine_src_u() from pixman-combine32.c
> I run perf on lowlevel-blt-bench over_n_8888 and what I got is:
> -   48.71%    48.68%  lowlevel-blt-be  lowlevel-blt-bench  [.]
> vmx_combine_over_u_no_mask
>    - vmx_combine_over_u_no_mask

Sorry, my mistake - for some reason I must have thought we were dealing
with src_n_8888 rather than over_n_8888. If you can beat the C version
using a solid fetcher (which fills a temporary buffer the size of the row
with a constant pixel value) and an optimised OVER combiner, then you
should be able to do better still if you cut out the temporary buffer and
keep the solid colour in registers.

>> Presumably for patch 3 of this series (over_n_0565) you wouldn't see
>> the same effect, as that can't be achieved using mempcy().
> Where is that patch ? I didn't see it in the mailing list.

My bad again - in my mind, the patches for over_n_8888 and over_n_0565 in
C and ARMv6 assembly were a group of four and I overlooked the fact that
when Pekka split them in order to make the benchmarks more robust, he
only reposted the over_n_8888 ones. My original over_n_0565 patches are



More information about the Pixman mailing list