[PATCH] drm/vc4: improve throughput by pipelining binning and rendering jobs
Varad Gautam
varadgautam at gmail.com
Sun Mar 6 10:20:23 UTC 2016
Hi Eric,
On Sat, Mar 5, 2016 at 7:17 AM, Eric Anholt <eric at anholt.net> wrote:
> Varad Gautam <varadgautam at gmail.com> writes:
>
>> The hardware provides us with separate threads for binning and
>> rendering, and the existing model waits for them both to complete
>> before submitting the next job.
>>
>> Splitting the binning and rendering submissions reduces idle time
>> and gives us approx 20-30% speedup with several x11perf tests.
>
> This patch is:
>
> Reviewed-by: Eric Anholt <eric at anholt.net.
>
> Which tests did you find improved, specifically? I'm seeing
openarena
> improved by 1.01897% +/- 0.247857% (n=16). x11perf -aa24text and
> -copypixwin looked like they had about the same level of improvement.
Here's a sample of the speedups I've noticed with x11perf:
without queue with queue % delta test
-(reps/sec)- -(reps/sec)- --- ---
1840000 2360000 28.26% 10x10 tiled rectangle (17x15 tile)
1920000 2440000 27.08% 10x10 tiled rectangle (4x4 tile)
1340000 1620000 20.90% 10x10 tiled rectangle (216x208
tile)
9900000 11900000 20.20% 10-pixel line
1310000 1570000 19.85% 10x10 tiled rectangle (161x145
tile)
2800000 3270000 16.79% 10x10 rectangle
2720000 3140000 15.44% 100-pixel vertical line segment
876000 1010000 15.30% 100-pixel line segment (2 kids)
199000 229000 15.08% Circulate Unmapped window (200
kids)
1190000 1350000 13.45% 100-pixel line segment (1 kid)
176000 199000 13.07% 500-pixel line segment
172000 194000 12.79% 500-pixel line
116000 129000 11.21% Destroy window via parent (100
kids)
2030000 2250000 10.84% 100-pixel horizontal line segment
635000 697000 9.76% 100-pixel line segment (3 kids)
>
> This conflicts with a change in -fixes. I think this means that it
> should go in -next once -fixes gets pulled in that.
>
> Peter Brown had suggested to me at one point that we could queue up
> multiple jobs at once by patching the last few bytes of the current
> job
> to jump to the next one. I haven't fully thought through how you'd
> interlock to make sure that the CL wasn't going to execute the old
> contents before you go to sleep, but it has the promise of being able
> to
> mask out the flush/frame done interrupts.
A rough idea is to keep track of our current job's start address (which
may be the previous job's jump destination) and resubmit from here if we
come back from sleep. Will see if I can build up on this.
Thanks,
Varad
More information about the dri-devel
mailing list