[Pixman] Cleaning patchwork

Sun Jan 31 08:25:13 PST 2016

On Wed, Jan 13, 2016 at 3:16 PM, Pekka Paalanen
<pekka.paalanen at collabora.co.uk> wrote:
> On Mon, 11 Jan 2016 11:37:57 +0200
> Oded Gabbay <oded.gabbay at gmail.com> wrote:
>
>> On Mon, Jan 4, 2016 at 4:31 PM, Ben Avison <bavison at riscosopen.org> wrote:
>> > On Tue, 22 Dec 2015 13:14:21 -0000, Oded Gabbay <oded.gabbay at gmail.com>
>> > wrote:
>> >
>> >> Hi Ben,
>> >>
>> >> I would like to clean pixman's patchwork and you have about 20
>> >> outstanding patches.
>> >
>> > [...]
>> >
>> >> [4/4] pixman-fast-path: Make bilinear cover fetcher use COVER_CLIP_TIGHT
>> >> flag
>> >> [3/4] armv7/mips/sse2: Make bilinear cover fast paths use COVER_CLIP_TIGHT
>> >> flag
>> >> [2/4] Introduce FAST_PATH_SAMPLES_COVER_CLIP_TIGHT_BILINEAR flag
>> >
>> >
>> > I still think these are a worthwhile improvement and have described some
>> > conditions under which it should be measurable using affine-bench. I believe
>> > Pekka wanted to prove it using real-world traces though, and since there was
>> > no measurable change for better or worse using the normal Cairo traces, he
>> > was attempting to capture some new ones that would exercise the cases I
>> > described. Last I heard though, he had found that Cairo's tracer was broken,
>> > so he was unable to make progress. Under the circumstances, do you think
>> > affine-bench results alone would be acceptable?
>>
>> Hi Pekka,
>> I admit I didn't follow these patches when Ben sent them as you and
>> Siarhei reviewed them.
>> May I ask you to write your opinion on the matter ?
>
> Hi Oded,
>
> that's a tough question.
>
> Ben's work is based on original performance research done by me and my
> colleagues, which hinted which cases could use optimization. That was
> quite a long time ago on Raspberry Pi 1.
>
> There could be several reasons why I failed to find cases where these
> make a measurable difference. As said, recording traces was difficult,
> so perhaps I just could not record the right use cases. The software,
> e.g. Epiphany, may have changed since to not hit these cases anymore.
> Profiling on amd64 might just not make a difference, and profiling on
> Rpi takes a lot of time and I ran out.
>
> If I wasn't involved in this, and taking Pixman's developement
> mentality that I've picked up from the old maintainers into account,
> optimizations should not be landed without proof that they help
> real-world use cases. But there is indeed that room for interpretation:
> do we have sufficient proof? Does affine-bench mimic some real-world
> use cases or is it purely artificial?
>
> I am biased, because I would hate to see Ben's work go wasted and I am
> partly responsible for him doing that work. That's also why I used a lot
> of effort in trying to prove the benefits.
>
> But, I worked on the premise that we *have to* prove real-world
> benefit. At least that would have been unquestionable. Maybe I overdid
> it and affine-bench is enough after all?
>
> The same questions apply to lowlevel-blt-bench, too.

Sorry for the late response, been occupied with product release cycles, etc.
This is quite a dilemma. Honestly, I also hate seeing good work goes to waste.

>From my experience, you really can't cover all real world use-cases.
You simply don't have the time to do it nor the knowledge of all the
applications that use your library.

So, the standard way is to write a synthetic use-case, and have a
golden real-world benchmark that you usually check against, which is
cairo in our case.

If after all the time you invested you couldn't find a difference, for
better or worse, and we have a synthetic test that shows improvement,
than I support of this code. I think that keeping things static for no
reason is not the best route for open source development as it scares
good people away.

That's also why we have development versions and stable versions. This
code will be tested for a year before going into stable, and if no one
will shout during that time, than I guess we made the right choice.
And if someone will shout, we can always fix it.

>
>>
>> >
>> >> Resolve implementation-defined behaviour for division rounded to -infinity
>> >
>> >
>> > This one got bogged down in an argument about whether C89 should still be
>> > supported or not, which I'm not sure was ever resolved?
>>
>> So from reading back the thread, I saw two things:
>> 1. Pekka and Siarhei  wrote about adding test(s) to verify behavior of % and /
>> 2. C89 compatibility issue. Because we have already forked 0.34 and
>> 0.36 seems like a 2017 story, I think we can start phasing it out in
>> the 0.35 development releases and see if someone complains.
>
> The only thing I can say here is that if you rely on something but are
> not quite sure it's true, then make sure you at least catch it when
> it's false. I suppose the needed tests here would be few and simple?
>
> I cannot comment on C89.
>
Ben, do you want to add those tests ?
Regarding C89, I stand by what I wrote in the previous email (phasing
it out during 2016).

>> >
>> >> [05/37,v2] composite flags: Early detection of fully opaque 1x1
>> >> repeating source images
>> >> [10/37,v2] pixman-fast-path: Add in_8888_8 fast path
>> >> [11/37,v2] armv6: Add in_8888_8 fast path
>> >> [23/37,v2] armv6: Add optimised scanline fetchers and writeback for
>> >> r5g6b5 and a8
>> >> [24/37,v2] armv6: Add optimised scanline fetcher for a1r5g5b5
>> >> [25/37,v2] armv6: Add src_1555_8888 fast path
>> >
>> >
>> > v1 of these patches were reviewed (excluding the ARM assembly parts) by
>> > Søren on 05 Oct 2014, and v2 addressed the issue he raised. There haven't
>> > been any comments on the reposted versions.
>>
>> Could you re-post only the the ones you fixed with the v2 remarks ?
>> I'll take a look and hopefully we could merge them.
>>
>> >
>> >> armv7: Re-use existing fast paths in more cases
>> >> armv7: Re-use existing fast paths in more cases
>> >
>> >
>> > These apply the same rules that Søren suggested for my ARMv6 paths in the v2
>> > patches above to the pre-existing ARMv7 paths as well. The only review
>> > comment received so far was that the two patches needed different names.
>>
>> Same comment as above.
>>
>> >
>> >> [2/5,v2] armv7: Add in_8888_8 fast path
>> >
>> >
>> > The only difference for v2 here was again just a matter of defining the
>> > widest possible set of pixel formats to match the fast path.
>>
>> Same comment as above.
>>
>> >
>> >> [5/5] armv7: Add src_1555_8888 fast path
>> >> [4/5] armv7: Add in_reverse_8888_8888 fast path
>> >> [3/5] armv7: Add in_n_8888 fast path
>> >> [1/5] armv7: Use prefetch for small-width images too
>> >> [3/3] armv7: Use VLD-to-all-lanes
>> >> [2/3] armv7: Faster fill operations
>> >> [1/3] armv7: Coalesce scalar accesses where possible
>> >> Require destination images to be bitmaps
>> >
>> >
>> > None of these have received any comments to date. In many cases, I suspect
>> > it's the fact that they involve ARM assembly, and we don't have many (any?)
>> > active reviewers who know it. I'm not sure what we can do about that.
>>
>> Send them as a different series and I'll take a look (as much as I
>> can, I also don't know ARM assembly).
>
> Regarding ARM assembly, since we don't really have anyone to review it,
> I'd propose we rely on the test suite for verification. That leaves
> only the question "is this new path exercised by the test suite?"
>
> If you can tell "yes", trust the test suite and leave it at that. If
> there is like one hit or none, then one has to improve the test suite
> first. It is fairly easy, because Pixman has a gold-standard
> implementation to compare against, so setting up new or extending
> fuzzing tests can just assume it is correct.
>
> So, some sort of test coverity analysis? If the new function shows up
> in 'perf record' of 'make check', it's good? If it doesn't, then... add
> a printf-call?
>
>
> Thanks,
> pq
>

I fully agree with Pekka on this. As we don't have ARM people around
here to review the assembly we must trust the test suite to verify it.
Ben, I think you just need to follow Pekka's advice and then we can
have those patches merged.

Thanks,

     Oded

>> >
>> >> I also suggest that for the relevant ones (if there are any), you
>> >> would rebase them on the latest master and re-send them as a single
>> >> series in order to restart the review process.
>> >
>> >
>> > I can certainly do that, but I had previously been asked to split my
>> > postings into smaller series that were independent. I have a lot more
>> > patches waiting to post that depend on the ones that are stuck in limbo -
>> > the complete set can be seen at
>> >
>> > https://github.com/bavison/pixman/commits/arm-neon-release1
>> >
>> > if you're interested. Or I could just post/repost the whole lot (72 patches
>> > now), like I did with my 37-patch series from 2014.
>> >
>> > Ben
>>
>> Let's start with the patches you wrote above and move step by step after it.
>>
>> Thanks,
>>
>>         Oded