[Intel-gfx] [RFC PATCH] drm/i915: Prefault the entire object on first page fault

Daniel Vetter daniel at ffwll.ch
Tue Feb 4 15:15:26 CET 2014


On Tue, Feb 04, 2014 at 03:12:49PM +0100, Daniel Vetter wrote:
> On Tue, Feb 04, 2014 at 01:30:19PM +0000, Chris Wilson wrote:
> > Inserting additional PTEs has no side-effect for us as the pfn are fixed
> > for the entire time the object is resident in the global GTT. The
> > downside is that we pay the entire cost of faulting the object upon the
> > first hit, for which we in return receive the benefit of removing the
> > per-page faulting overhead.
> > 
> > On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
> > Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
> > Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
> > Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
> > Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
> > Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
> > Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
> > Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
> > Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
> > Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
> > Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
> > Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
> > Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: "Goel, Akash" <akash.goel at intel.com>
> > ---
> > 
> > It survived light testing without noticable performance degradation. Can
> > anyone think of how this will impact us negatively?
> 
> piglit does an awful lot of single-pixel readbacks iirc, that's about the
> only thing I could think of. Maybe we should wait until we have the
> vm_insert_pfn_frm_io_mapping to not adversely affect this. Or if the
> overhead is negligible we could move ahead right away. Nothing else really
> crosses my mind which would qualify as real-world usage.

On that topic: What's the improvement of the optimized insert_pfn_pgprot
with the prefault patch applied when doing just single dword writes? I.e.
just to measure the the insert_pfn performance so that we have some
impressive microbenchmark numbers justifying things? I'm thinking of a 2nd
mode in your test to measure pagefaults/s.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list