[Intel-gfx] [RFC PATCH] drm/i915: Prefault the entire object on first page fault
Chris Wilson
chris at chris-wilson.co.uk
Tue Feb 4 15:33:48 CET 2014
On Tue, Feb 04, 2014 at 03:15:26PM +0100, Daniel Vetter wrote:
> On Tue, Feb 04, 2014 at 03:12:49PM +0100, Daniel Vetter wrote:
> > On Tue, Feb 04, 2014 at 01:30:19PM +0000, Chris Wilson wrote:
> > > Inserting additional PTEs has no side-effect for us as the pfn are fixed
> > > for the entire time the object is resident in the global GTT. The
> > > downside is that we pay the entire cost of faulting the object upon the
> > > first hit, for which we in return receive the benefit of removing the
> > > per-page faulting overhead.
> > >
> > > On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
> > > Upload rate for 2 linear surfaces: 8127MiB/s -> 8134MiB/s
> > > Upload rate for 2 tiled surfaces: 8607MiB/s -> 8625MiB/s
> > > Upload rate for 4 linear surfaces: 8127MiB/s -> 8127MiB/s
> > > Upload rate for 4 tiled surfaces: 8611MiB/s -> 8602MiB/s
> > > Upload rate for 8 linear surfaces: 8114MiB/s -> 8124MiB/s
> > > Upload rate for 8 tiled surfaces: 8601MiB/s -> 8603MiB/s
> > > Upload rate for 16 linear surfaces: 8110MiB/s -> 8123MiB/s
> > > Upload rate for 16 tiled surfaces: 8595MiB/s -> 8606MiB/s
> > > Upload rate for 32 linear surfaces: 8104MiB/s -> 8121MiB/s
> > > Upload rate for 32 tiled surfaces: 8589MiB/s -> 8605MiB/s
> > > Upload rate for 64 linear surfaces: 8107MiB/s -> 8121MiB/s
> > > Upload rate for 64 tiled surfaces: 2013MiB/s -> 3017MiB/s
> > >
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: "Goel, Akash" <akash.goel at intel.com>
> > > ---
> > >
> > > It survived light testing without noticable performance degradation. Can
> > > anyone think of how this will impact us negatively?
> >
> > piglit does an awful lot of single-pixel readbacks iirc, that's about the
> > only thing I could think of. Maybe we should wait until we have the
> > vm_insert_pfn_frm_io_mapping to not adversely affect this. Or if the
> > overhead is negligible we could move ahead right away. Nothing else really
> > crosses my mind which would qualify as real-world usage.
>
> On that topic: What's the improvement of the optimized insert_pfn_pgprot
> with the prefault patch applied when doing just single dword writes? I.e.
> just to measure the the insert_pfn performance so that we have some
> impressive microbenchmark numbers justifying things? I'm thinking of a 2nd
> mode in your test to measure pagefaults/s.
Not pagefault/s yet, but varying object/write sizes is interesting.
IGT-Version: 1.5-g906b862 (x86_64) (Linux: 3.13.0+ x86_64)
4/4096: Upload rate for 2 linear surfaces: 651.042MiB/s
4/4096: Upload rate for 2 tiled surfaces: 1302.083MiB/s
4/4096: Upload rate for 4 linear surfaces: 1116.071MiB/s
4/4096: Upload rate for 4 tiled surfaces: 1736.111MiB/s
4/4096: Upload rate for 8 linear surfaces: 892.857MiB/s
4/4096: Upload rate for 8 tiled surfaces: 1420.455MiB/s
4/4096: Upload rate for 16 linear surfaces: 57.710MiB/s
4/4096: Upload rate for 16 tiled surfaces: 58.685MiB/s
4/4096: Upload rate for 32 linear surfaces: 59.018MiB/s
4/4096: Upload rate for 32 tiled surfaces: 59.780MiB/s
4/4096: Upload rate for 64 linear surfaces: 59.060MiB/s
4/4096: Upload rate for 64 tiled surfaces: 2.021MiB/s
Test assertion failure function performance, file
gem_fence_upload.c:108:
Last errno: 0, Success
Failed assertion: linear[1] > 0.75 * linear[0]
Subtest 4KiB (single dword): FAIL
4096/4096: Upload rate for 2 linear surfaces: 9259.259MiB/s
4096/4096: Upload rate for 2 tiled surfaces: 9153.318MiB/s
4096/4096: Upload rate for 4 linear surfaces: 9237.875MiB/s
4096/4096: Upload rate for 4 tiled surfaces: 9190.121MiB/s
4096/4096: Upload rate for 8 linear surfaces: 9235.209MiB/s
4096/4096: Upload rate for 8 tiled surfaces: 9280.742MiB/s
4096/4096: Upload rate for 16 linear surfaces: 9300.974MiB/s
4096/4096: Upload rate for 16 tiled surfaces: 9284.782MiB/s
4096/4096: Upload rate for 32 linear surfaces: 9311.122MiB/s
4096/4096: Upload rate for 32 tiled surfaces: 9311.122MiB/s
4096/4096: Upload rate for 64 linear surfaces: 9291.184MiB/s
4096/4096: Upload rate for 64 tiled surfaces: 1685.708MiB/s
Test assertion failure function performance, file
gem_fence_upload.c:109:
Last errno: 0, Success
Failed assertion: tiled[1] > 0.75 * tiled[0]
Subtest 4KiB: FAIL
4/1048576: Upload rate for 2 linear surfaces: 21.945MiB/s
4/1048576: Upload rate for 2 tiled surfaces: 411.184MiB/s
4/1048576: Upload rate for 4 linear surfaces: 24.529MiB/s
4/1048576: Upload rate for 4 tiled surfaces: 434.028MiB/s
4/1048576: Upload rate for 8 linear surfaces: 21.448MiB/s
4/1048576: Upload rate for 8 tiled surfaces: 195.313MiB/s
4/1048576: Upload rate for 16 linear surfaces: 16.644MiB/s
4/1048576: Upload rate for 16 tiled surfaces: 53.373MiB/s
4/1048576: Upload rate for 32 linear surfaces: 16.563MiB/s
4/1048576: Upload rate for 32 tiled surfaces: 55.285MiB/s
4/1048576: Upload rate for 64 linear surfaces: 15.486MiB/s
4/1048576: Upload rate for 64 tiled surfaces: 0.107MiB/s
Test assertion failure function performance, file
gem_fence_upload.c:108:
Last errno: 0, Success
Failed assertion: linear[1] > 0.75 * linear[0]
Subtest 1MiB (single dword): FAIL
1048576/1048576: Upload rate for 2 linear surfaces: 8136.153MiB/s
1048576/1048576: Upload rate for 2 tiled surfaces: 8633.445MiB/s
1048576/1048576: Upload rate for 4 linear surfaces: 8128.936MiB/s
1048576/1048576: Upload rate for 4 tiled surfaces: 8614.996MiB/s
1048576/1048576: Upload rate for 8 linear surfaces: 8126.130MiB/s
1048576/1048576: Upload rate for 8 tiled surfaces: 8615.187MiB/s
1048576/1048576: Upload rate for 16 linear surfaces: 8127.811MiB/s
1048576/1048576: Upload rate for 16 tiled surfaces: 8617.108MiB/s
1048576/1048576: Upload rate for 32 linear surfaces: 8125.888MiB/s
1048576/1048576: Upload rate for 32 tiled surfaces: 8612.528MiB/s
1048576/1048576: Upload rate for 64 linear surfaces: 8128.412MiB/s
1048576/1048576: Upload rate for 64 tiled surfaces: 4522.448MiB/s
Test assertion failure function performance, file
gem_fence_upload.c:109:
Last errno: 0, Success
Failed assertion: tiled[1] > 0.75 * tiled[0]
Subtest 1MiB: FAIL
There's still the obvious cliff >32 fences, but also the interesting
transition at 8 objects, and the odd effect of tiled vs linear.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list