[Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)

Wed Aug 26 01:54:51 PDT 2015

On Wed, Aug 26, 2015 at 11:46 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> On Tue, 25 Aug 2015 15:45:48 +0300
> Oded Gabbay <oded.gabbay at gmail.com> wrote:
>
>> On Mon, Aug 24, 2015 at 7:04 PM, Ben Avison <bavison at riscosopen.org> wrote:
>> >
>> > On Mon, 24 Aug 2015 15:20:22 +0100, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>> >>
>> >> I tested the patch on POWER8, ppc64le.
>> >> make check passes, but when I benchmarked the performance using
>> >> lowlevel-blt-bench over_n_8888, I got far worse results than without
>> >> this patch (see numbers below).
>> >> Apparently, the new C fast-path takes precedence over some vmx combine
>> >> fast-paths, thus making performance worse instead of better.
>> >
>> >
>> > If that's true, it wouldn't be the first time that this issue has arisen.
>> > Pixman is designed to scan all fast paths (i.e. routines which take
>> > source, mask and destination images and perform the whole operation
>> > themselves) irrespective of how well tuned they are for a given platform,
>> > before it starts looking at iter routines (each of which reads or writes
>> > only one of source, mask or destination).
>> >
>> > Previously, in response to my attempt to work around a case where this
>> > was happening, Søren wrote:
>> >>
>> >> [...] the longstanding problem that pixman sometimes won't
>> >> pick the fastest code for some operations. In this case the general
>> >> iter based code will be faster then the C code in pixman-fast-path.c
>> >> because the iter code will use assembly fetchers.
>> >> As a result you end up with a bunch of partial reimplementations of
>> >> general_composite_rect() inside pixman-arm-simd.c.
>> >> Maybe we need to admit failure and make general_composite_rect()
>> >> available for other implementations to use. Essentially we would
>> >> officially provide a way for implementations to say: My iterators are
>> >> faster than pixman-fast-path.c for these specific operations, so just
>> >> go directly to general_composite_rect().
>> >> It's definitely not a pleasant thing to do, but given that nobody is
>> >> likely to have the time/skill combination required to do the necessary
>> >> redesign of pixman's fast path system, it might still be preferable to
>> >> to do this instead of duplicating the code like these patches do.
>> >> With a setup like that, we could also fix the same issue for the
>> >> bilinear SSSE3 fetchers and possibly other cases.
>> >
>> >
>> > (ref http://lists.freedesktop.org/archives/pixman/2014-October/003457.html)
>> >
>> > I can't say that any cleaner solution has occurred to me since then.
>>
>> I think the more immediate solution, as Soren have suggested on IRC,
>> is for me to implement the equivalent fast-path in VMX.
>> I see that it is already implemented in mmx, sse2, mips-dspr2 and arm-neon.
>> From looking at the C code, I'm guessing that it is fairly simple to implement.
>
> What if we just dropped the over_n_8888 C fast path? And all the rest
> of new C fast path patches.
>
> It's clearly controversial to add C fast paths, because it affects all
> targets that don't have an asm fast path for the same, and we cannot
> tell by just review whether it is going to be for better or (much)
> worse.
>
> I also find it hard to see what the affected targets would be.
>
> Rpi should not be affected if we drop the C fast path patch,
> there is still the asm fast path patch. Right?
>
>
> Thanks,
> pq

I don't object to that. It seems much more beneficial to add an
arch-dependent fast-path/iterator, and I think our time should be
invested there - each one in his choice of arch(s).

       Oded