[Pixman] Cleaning patchwork

Wed Jan 13 05:16:38 PST 2016

On Mon, 11 Jan 2016 11:37:57 +0200
Oded Gabbay <oded.gabbay at gmail.com> wrote:

> On Mon, Jan 4, 2016 at 4:31 PM, Ben Avison <bavison at riscosopen.org> wrote:
> > On Tue, 22 Dec 2015 13:14:21 -0000, Oded Gabbay <oded.gabbay at gmail.com>
> > wrote:
> >  
> >> Hi Ben,
> >>
> >> I would like to clean pixman's patchwork and you have about 20
> >> outstanding patches.  
> >
> > [...]
> >  
> >> [4/4] pixman-fast-path: Make bilinear cover fetcher use COVER_CLIP_TIGHT
> >> flag
> >> [3/4] armv7/mips/sse2: Make bilinear cover fast paths use COVER_CLIP_TIGHT
> >> flag
> >> [2/4] Introduce FAST_PATH_SAMPLES_COVER_CLIP_TIGHT_BILINEAR flag  
> >
> >
> > I still think these are a worthwhile improvement and have described some
> > conditions under which it should be measurable using affine-bench. I believe
> > Pekka wanted to prove it using real-world traces though, and since there was
> > no measurable change for better or worse using the normal Cairo traces, he
> > was attempting to capture some new ones that would exercise the cases I
> > described. Last I heard though, he had found that Cairo's tracer was broken,
> > so he was unable to make progress. Under the circumstances, do you think
> > affine-bench results alone would be acceptable?  
> 
> Hi Pekka,
> I admit I didn't follow these patches when Ben sent them as you and
> Siarhei reviewed them.
> May I ask you to write your opinion on the matter ?

Hi Oded,

that's a tough question.

Ben's work is based on original performance research done by me and my
colleagues, which hinted which cases could use optimization. That was
quite a long time ago on Raspberry Pi 1.

There could be several reasons why I failed to find cases where these
make a measurable difference. As said, recording traces was difficult,
so perhaps I just could not record the right use cases. The software,
e.g. Epiphany, may have changed since to not hit these cases anymore.
Profiling on amd64 might just not make a difference, and profiling on
Rpi takes a lot of time and I ran out.

If I wasn't involved in this, and taking Pixman's developement
mentality that I've picked up from the old maintainers into account,
optimizations should not be landed without proof that they help
real-world use cases. But there is indeed that room for interpretation:
do we have sufficient proof? Does affine-bench mimic some real-world
use cases or is it purely artificial?

I am biased, because I would hate to see Ben's work go wasted and I am
partly responsible for him doing that work. That's also why I used a lot
of effort in trying to prove the benefits.

But, I worked on the premise that we *have to* prove real-world
benefit. At least that would have been unquestionable. Maybe I overdid
it and affine-bench is enough after all?

The same questions apply to lowlevel-blt-bench, too.

> 
> >  
> >> Resolve implementation-defined behaviour for division rounded to -infinity  
> >
> >
> > This one got bogged down in an argument about whether C89 should still be
> > supported or not, which I'm not sure was ever resolved?  
> 
> So from reading back the thread, I saw two things:
> 1. Pekka and Siarhei  wrote about adding test(s) to verify behavior of % and /
> 2. C89 compatibility issue. Because we have already forked 0.34 and
> 0.36 seems like a 2017 story, I think we can start phasing it out in
> the 0.35 development releases and see if someone complains.

The only thing I can say here is that if you rely on something but are
not quite sure it's true, then make sure you at least catch it when
it's false. I suppose the needed tests here would be few and simple?

I cannot comment on C89.

> >  
> >> [05/37,v2] composite flags: Early detection of fully opaque 1x1
> >> repeating source images
> >> [10/37,v2] pixman-fast-path: Add in_8888_8 fast path
> >> [11/37,v2] armv6: Add in_8888_8 fast path
> >> [23/37,v2] armv6: Add optimised scanline fetchers and writeback for
> >> r5g6b5 and a8
> >> [24/37,v2] armv6: Add optimised scanline fetcher for a1r5g5b5
> >> [25/37,v2] armv6: Add src_1555_8888 fast path  
> >
> >
> > v1 of these patches were reviewed (excluding the ARM assembly parts) by
> > Søren on 05 Oct 2014, and v2 addressed the issue he raised. There haven't
> > been any comments on the reposted versions.  
> 
> Could you re-post only the the ones you fixed with the v2 remarks ?
> I'll take a look and hopefully we could merge them.
> 
> >  
> >> armv7: Re-use existing fast paths in more cases
> >> armv7: Re-use existing fast paths in more cases  
> >
> >
> > These apply the same rules that Søren suggested for my ARMv6 paths in the v2
> > patches above to the pre-existing ARMv7 paths as well. The only review
> > comment received so far was that the two patches needed different names.  
> 
> Same comment as above.
> 
> >  
> >> [2/5,v2] armv7: Add in_8888_8 fast path  
> >
> >
> > The only difference for v2 here was again just a matter of defining the
> > widest possible set of pixel formats to match the fast path.  
> 
> Same comment as above.
> 
> >  
> >> [5/5] armv7: Add src_1555_8888 fast path
> >> [4/5] armv7: Add in_reverse_8888_8888 fast path
> >> [3/5] armv7: Add in_n_8888 fast path
> >> [1/5] armv7: Use prefetch for small-width images too
> >> [3/3] armv7: Use VLD-to-all-lanes
> >> [2/3] armv7: Faster fill operations
> >> [1/3] armv7: Coalesce scalar accesses where possible
> >> Require destination images to be bitmaps  
> >
> >
> > None of these have received any comments to date. In many cases, I suspect
> > it's the fact that they involve ARM assembly, and we don't have many (any?)
> > active reviewers who know it. I'm not sure what we can do about that.  
> 
> Send them as a different series and I'll take a look (as much as I
> can, I also don't know ARM assembly).

Regarding ARM assembly, since we don't really have anyone to review it,
I'd propose we rely on the test suite for verification. That leaves
only the question "is this new path exercised by the test suite?"

If you can tell "yes", trust the test suite and leave it at that. If
there is like one hit or none, then one has to improve the test suite
first. It is fairly easy, because Pixman has a gold-standard
implementation to compare against, so setting up new or extending
fuzzing tests can just assume it is correct.

So, some sort of test coverity analysis? If the new function shows up
in 'perf record' of 'make check', it's good? If it doesn't, then... add
a printf-call?

Thanks,
pq

> >  
> >> I also suggest that for the relevant ones (if there are any), you
> >> would rebase them on the latest master and re-send them as a single
> >> series in order to restart the review process.  
> >
> >
> > I can certainly do that, but I had previously been asked to split my
> > postings into smaller series that were independent. I have a lot more
> > patches waiting to post that depend on the ones that are stuck in limbo -
> > the complete set can be seen at
> >
> > https://github.com/bavison/pixman/commits/arm-neon-release1
> >
> > if you're interested. Or I could just post/repost the whole lot (72 patches
> > now), like I did with my 37-patch series from 2014.
> >
> > Ben  
> 
> Let's start with the patches you wrote above and move step by step after it.
> 
> Thanks,
> 
>         Oded