etnaviv: Beware of the PULSE_EATER

Wladimir J. van der Laan laanwj at gmail.com
Sun Dec 11 17:41:08 UTC 2016


On Sun, Dec 11, 2016 at 08:58:59AM -0800, Chris Healy wrote:
> Wow, that's a pretty big difference!
> Did you by any chance test with the i.MX6 with GC2000 too?  It would be
> interesting to see if there was a similar performance degradation as well
> as what the absolute difference is between the GC2000 and GC3000 platforms.

I could try, this code currently only works on GC3000 but there's no reason it
could not be ported to GC2000 (just a matter of copy/pasting shader code and
command buffers basically).

> Also, with regard to the OCRAM, can you specify which OCRAM was used?  (Did
> you stick to the 2 128KB OCRAM hanging off MX6FAST2 or was the 1 256KB
> OCRAM hanging off MX6FAST3 also used?  The 2 128KB OCRAM hanging off
> MX6FAST2 should have the highest OCRAM performance.)

I used the first two - 0x900000 and 0x920000, reading/writing the full 256kiB
in each pass.
Didn't realize at the moment I wrote that code that they were different, I'll
benchmark them individually and see if there's a difference.

Wladimir

> On Sun, Dec 11, 2016 at 8:27 AM, Wladimir J. van der Laan <laanwj at gmail.com>
> wrote:
> 
> >
> > Entirly deserving of its B-horror name it does unspeakable
> > things to performance if not set up correctly.
> >
> > I was doing memory benchmarks and noticed a serious discrepancy
> > between running the same CL program with the Vivante driver
> > and replaying it on etnaviv.
> >
> > After carefully comparing the registers written at initialization
> > with both kernel drivers, it became apparent that the etnaviv
> > DRM driver is not setting up register 0x0010C. Adding a single line
> > at the end of etnaviv_gpu_init sped up pure-read performance from shaders
> > by
> > 7.5x and write performance by 1.5x:
> >
> >         gpu_write(gpu, VIVS_PM_PULSE_EATER, 0x015d0880);
> >
> > (instead of this line we likely want to replicate viv_gpu's per-model
> > logic)
> >
> > Apart from silly CL benchmarks, I expect this to make a large impact with
> > texturing and alpha blending as well. Exact results for i.MX6qp (GC3000)
> > below.
> >
> > Wladimir
> >
> > ("readacc" does read-and-accumulate into register, "read" does only reads,
> > "write"
> > does only writes, "ram" benchmarks DDR3 RAM, "ocram" i.MX6 onchip ram)
> >
> > kernel 4.8.4 + etnaviv
> > ----------------------
> >
> > Version: 1.0.0
> >   Name: etnaviv
> >   Date: 20151214
> >   Description: etnaviv DRM
> > [read]    [ram]   Speed:  15.395 MB/s
> > [readacc] [ram]   Speed:  14.680 MB/s (compare to readacc4x_alt.cl below
> > :( )
> > [write]   [ram]   Speed: 148.453 MB/s (compare to write4x_alt.cl below :(
> > )
> > [read]    [ocram] Speed:  15.097 MB/s
> > [readacc] [ocram] Speed:  14.367 MB/s
> > [write]   [ocram] Speed: 148.419 MB/s
> >
> > kernel 4.1.5 + viv_gpu
> > ----------------------
> >
> > [ram]
> > (no bench_read.cl: pure reads are optimized away by CL compiler)
> > Program: bench_readacc4x_alt.cl    Speed:  83.720 MB/s
> > Program: bench_write4x_alt.cl      Speed: 234.708 MB/s
> >
> > etnaviv 4.8.4 after PULSE_EATER neutered
> > ------------------------------------------
> >
> > Version: 1.0.0
> >   Name: etnaviv
> >   Date: 20151214
> >   Description: etnaviv DRM
> > [read]    [ram]   Speed: 113.414 MB/s
> > [readacc] [ram]   Speed:  82.395 MB/s (comparable to readacc4x_alt.cl
> > above)
> > [write]   [ram]   Speed: 234.955 MB/s (comparable to write4x_alt.cl above)
> > [read]    [ocram] Speed:  87.912 MB/s
> > [readacc] [ocram] Speed:  70.329 MB/s
> > [write]   [ocram] Speed: 234.956 MB/s
> >
> >


More information about the etnaviv mailing list