[Intel-gfx] [RFC PATCH] drm/i915: Prefault the entire object on first page fault

Chris Wilson chris at chris-wilson.co.uk
Tue Feb 4 15:33:48 CET 2014


On Tue, Feb 04, 2014 at 03:15:26PM +0100, Daniel Vetter wrote:
> On Tue, Feb 04, 2014 at 03:12:49PM +0100, Daniel Vetter wrote:
> > On Tue, Feb 04, 2014 at 01:30:19PM +0000, Chris Wilson wrote:
> > > Inserting additional PTEs has no side-effect for us as the pfn are fixed
> > > for the entire time the object is resident in the global GTT. The
> > > downside is that we pay the entire cost of faulting the object upon the
> > > first hit, for which we in return receive the benefit of removing the
> > > per-page faulting overhead.
> > > 
> > > On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
> > > Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
> > > Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
> > > Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
> > > Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
> > > Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
> > > Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
> > > Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
> > > Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
> > > Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
> > > Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
> > > Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
> > > Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s
> > > 
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: "Goel, Akash" <akash.goel at intel.com>
> > > ---
> > > 
> > > It survived light testing without noticable performance degradation. Can
> > > anyone think of how this will impact us negatively?
> > 
> > piglit does an awful lot of single-pixel readbacks iirc, that's about the
> > only thing I could think of. Maybe we should wait until we have the
> > vm_insert_pfn_frm_io_mapping to not adversely affect this. Or if the
> > overhead is negligible we could move ahead right away. Nothing else really
> > crosses my mind which would qualify as real-world usage.
> 
> On that topic: What's the improvement of the optimized insert_pfn_pgprot
> with the prefault patch applied when doing just single dword writes? I.e.
> just to measure the the insert_pfn performance so that we have some
> impressive microbenchmark numbers justifying things? I'm thinking of a 2nd
> mode in your test to measure pagefaults/s.

Not pagefault/s yet, but varying object/write sizes is interesting.

IGT-Version: 1.5-g906b862 (x86_64) (Linux: 3.13.0+ x86_64)
4/4096: Upload rate for 2 linear surfaces:	651.042MiB/s
4/4096: Upload rate for 2 tiled surfaces:	1302.083MiB/s
4/4096: Upload rate for 4 linear surfaces:	1116.071MiB/s
4/4096: Upload rate for 4 tiled surfaces:	1736.111MiB/s
4/4096: Upload rate for 8 linear surfaces:	892.857MiB/s
4/4096: Upload rate for 8 tiled surfaces:	1420.455MiB/s
4/4096: Upload rate for 16 linear surfaces:	 57.710MiB/s
4/4096: Upload rate for 16 tiled surfaces:	 58.685MiB/s
4/4096: Upload rate for 32 linear surfaces:	 59.018MiB/s
4/4096: Upload rate for 32 tiled surfaces:	 59.780MiB/s
4/4096: Upload rate for 64 linear surfaces:	 59.060MiB/s
4/4096: Upload rate for 64 tiled surfaces:	  2.021MiB/s
Test assertion failure function performance, file
gem_fence_upload.c:108:
Last errno: 0, Success
Failed assertion: linear[1] > 0.75 * linear[0]
Subtest 4KiB (single dword): FAIL
4096/4096: Upload rate for 2 linear surfaces:	9259.259MiB/s
4096/4096: Upload rate for 2 tiled surfaces:	9153.318MiB/s
4096/4096: Upload rate for 4 linear surfaces:	9237.875MiB/s
4096/4096: Upload rate for 4 tiled surfaces:	9190.121MiB/s
4096/4096: Upload rate for 8 linear surfaces:	9235.209MiB/s
4096/4096: Upload rate for 8 tiled surfaces:	9280.742MiB/s
4096/4096: Upload rate for 16 linear surfaces:	9300.974MiB/s
4096/4096: Upload rate for 16 tiled surfaces:	9284.782MiB/s
4096/4096: Upload rate for 32 linear surfaces:	9311.122MiB/s
4096/4096: Upload rate for 32 tiled surfaces:	9311.122MiB/s
4096/4096: Upload rate for 64 linear surfaces:	9291.184MiB/s
4096/4096: Upload rate for 64 tiled surfaces:	1685.708MiB/s
Test assertion failure function performance, file
gem_fence_upload.c:109:
Last errno: 0, Success
Failed assertion: tiled[1] > 0.75 * tiled[0]
Subtest 4KiB: FAIL
4/1048576: Upload rate for 2 linear surfaces:	 21.945MiB/s
4/1048576: Upload rate for 2 tiled surfaces:	411.184MiB/s
4/1048576: Upload rate for 4 linear surfaces:	 24.529MiB/s
4/1048576: Upload rate for 4 tiled surfaces:	434.028MiB/s
4/1048576: Upload rate for 8 linear surfaces:	 21.448MiB/s
4/1048576: Upload rate for 8 tiled surfaces:	195.313MiB/s
4/1048576: Upload rate for 16 linear surfaces:	 16.644MiB/s
4/1048576: Upload rate for 16 tiled surfaces:	 53.373MiB/s
4/1048576: Upload rate for 32 linear surfaces:	 16.563MiB/s
4/1048576: Upload rate for 32 tiled surfaces:	 55.285MiB/s
4/1048576: Upload rate for 64 linear surfaces:	 15.486MiB/s
4/1048576: Upload rate for 64 tiled surfaces:	  0.107MiB/s
Test assertion failure function performance, file
gem_fence_upload.c:108:
Last errno: 0, Success
Failed assertion: linear[1] > 0.75 * linear[0]
Subtest 1MiB (single dword): FAIL
1048576/1048576: Upload rate for 2 linear surfaces:	8136.153MiB/s
1048576/1048576: Upload rate for 2 tiled surfaces:	8633.445MiB/s
1048576/1048576: Upload rate for 4 linear surfaces:	8128.936MiB/s
1048576/1048576: Upload rate for 4 tiled surfaces:	8614.996MiB/s
1048576/1048576: Upload rate for 8 linear surfaces:	8126.130MiB/s
1048576/1048576: Upload rate for 8 tiled surfaces:	8615.187MiB/s
1048576/1048576: Upload rate for 16 linear surfaces:	8127.811MiB/s
1048576/1048576: Upload rate for 16 tiled surfaces:	8617.108MiB/s
1048576/1048576: Upload rate for 32 linear surfaces:	8125.888MiB/s
1048576/1048576: Upload rate for 32 tiled surfaces:	8612.528MiB/s
1048576/1048576: Upload rate for 64 linear surfaces:	8128.412MiB/s
1048576/1048576: Upload rate for 64 tiled surfaces:	4522.448MiB/s
Test assertion failure function performance, file
gem_fence_upload.c:109:
Last errno: 0, Success
Failed assertion: tiled[1] > 0.75 * tiled[0]
Subtest 1MiB: FAIL

There's still the obvious cliff >32 fences, but also the interesting
transition at 8 objects, and the odd effect of tiled vs linear.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list