[Pixman] [PATCH 06/12] vmx: implement fast path vmx_composite_over_n_8888_8888_ca

Wed Jul 15 05:32:18 PDT 2015

On Wed, 15 Jul 2015 15:05:21 +0300
Oded Gabbay <oded.gabbay at gmail.com> wrote:

> On Wed, Jul 15, 2015 at 2:59 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > On Tue, 14 Jul 2015 11:41:25 +0300
> > Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> >
> >> On Thu,  2 Jul 2015 13:04:11 +0300
> >> Oded Gabbay <oded.gabbay at gmail.com> wrote:
> >>
> >> > POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.
> >> >
> >> > reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)
> >
> > *Very* stable memcpy speed, comparing to patch 7. Impressive.
> >
> >> >
> >> >                 Before           After           Change
> >> >               ---------------------------------------------
> >> > L1              61.92            244.91          +295.53%
> >> > L2              62.74            243.3           +287.79%
> >> > M               63.03            241.94          +283.85%
> >> > HT              59.91            144.22          +140.73%
> >> > VT              59.4             174.39          +193.59%
> >> > R               53.6             111.37          +107.78%
> >> > RT              37.99            46.38           +22.08%
> >> > Kops/s          436              506             +16.06%
> >> >
> >> > cairo trimmed benchmarks :
> >> >
> >> > Speedups
> >> > ========
> >> > t-xfce4-terminal-a1  1540.37 -> 1226.14 :  1.26x
> >> > t-firefox-talos-gfx  1488.59 -> 1209.19 :  1.23x
> >> >
> >> > Slowdowns
> >> > =========
> >> >         t-evolution  553.88  -> 581.63  :  1.05x
> >> >           t-poppler  364.99  -> 383.79  :  1.05x
> >> > t-firefox-scrolling  1223.65 -> 1304.34 :  1.07x
> >> >
> >> > Signed-off-by: Oded Gabbay <oded.gabbay at gmail.com>
> >>
> >> Acked-by: Siarhei Siamashka <siarhei.siamashka at gmail.com>
> >>
> >
> > Hi,
> >
> > why are there slowdowns up to 7%?
> I don't know, I didn't investigate it.
> 
> > Can the cost of adding more entries to the fast path table be that
> > much, or is something else going on?
> I would rule out this reason, because when I performed the benchmarks,
> I did it for each patch separately.
> i.e. for each new fast-path, I removed all the previous fast-paths I
> added to make sure I'm measuring the current patch. Therefore, the
> amount of entries in the fast-path table is the same among all the
> patches in this patch-set.

Oh. That's totally not what I expected. I was going to ask what
versions these benchmark results are comparing, because you didn't
mention, but I never expected *that*.

This is not what the users will be seeing, as they won't be using just
one of these patches, but the cumulative effect.

If not documented, I'd expect the benchmark results to be before vs.
after this patch, but since this patch is a part of a series, all the
earlier patches in the series already applied.

I'd really prefer an explicit mention of what was benchmarked.

> > Or if we don't care about that, why?
> I think that the speedups in this specific patch are more substantial
> than the slowdowns. If it was the other way around, than I would have
> removed this patch, like I did with another patch, which Siarhei
> rejected because of it.

But in theory, you should not get any slowdowns, right? Or did you
actually expect that some things will slow down?

Thanks,
pq

> >
> > If you have no idea, maybe check the "all" set of lowlevel-blt-bench if
> > you can find unrelated operations slowing down for some obscure reason.
> Good thinking. I'll try that offline.
> >
> > I suppose could also see if adding the same amount of fast path table
> > entries that will never match would cause the same slowdowns as this
> > patch.
> >
> As I said above, because of the way I tested it, I don't think this is
> the reason.
> 
>         Oded
> >
> > Thanks,
> > pq