[Intel-gfx] [libdrm PATCH] intel: Make unsynchronized GTT mappings work on systems with snooping.

Chris Wilson chris at chris-wilson.co.uk
Mon Mar 13 13:43:21 UTC 2017


On Sun, Mar 12, 2017 at 06:19:17PM +0100, David Weinehall wrote:
> On Sun, Mar 12, 2017 at 01:21:12PM +0000, Chris Wilson wrote:
> > On Fri, Mar 10, 2017 at 05:14:32PM -0800, Kenneth Graunke wrote:
> > > On systems without LLC, drm_intel_gem_bo_map_unsynchronized() has
> > > had the surprising behavior of doing a synchronized GTT mapping.
> > > This is obviously not what the user of the API wanted.
> > > 
> > > Eric left a comment indicating a valid concern: if the CPU and GPU
> > > caches are incoherent, we don't keep track of where the user last
> > > mapped the buffer, and what caches might contain relevant data.
> > 
> > Note this is an issue in libdrm_intel not tracking the cache domain
> > transitions. Even just a switch between cpu and coherent would solve the
> > majority of that - the caveat being shared bo where the tracking is
> > incomplete.
> >  
> > > Modern Atom systems still don't have LLC, but they do offer snooping,
> > > which effectively makes the caches coherent.  The kernel appears to
> > > set up the PTE/PPAT to enable snooping for everything where the cache
> > > level is not I915_CACHE_NONE.  As far as I know, only scanout buffers
> > > are marked as uncached.
> > 
> > Byt, bsw beg to differ. I don't have a bxt to know the results of the
> > igt/kernel tests.
> 
> Just give me a list of the tests to run (and, if any, what patches
> to apply and the debugging level you want enabled) and I'll provide
> the necessary results.

The most important result is igt/gem_mmap_gtt/coherency. That tests if a
write through the GTT is immediately visible in the backing storage. (It
should fail...)

To test the proposed used here that GTT + snooping is ok, first requires
disabling the test forbidding GTT + snooping in i915_gem_fault. Then
similar tests to gem_exec_flush or directly from kselftests/coherency
can be used to spot if we need any flushes.

> > > Any buffers used by scanout should be flagged as non-reusable with
> > > drm_intel_bo_disable_reuse(), prime export, or flink.  So, we can
> > > assume that any reusable buffer should be snooped.
> > 
> > Not really, there is no reason why scanout buffers can't be reused.
> >  
> > > This patch enables unsynchronized mappings for reusable buffers
> > > on all Gen6+ hardware (which have either LLC or snooping).
> > > 
> > > On Broxton, this improves the performance of Unigine Valley 1.0
> > > on Low settings at 1280x720 by about 45%, and Unigine Heaven 4.0
> > > (same settings) by about 53%.
> > 
> > Does anyone have figures for gtt performance on bxt - does it cover over
> > the same performance penalty from earler atoms? Basically why bother to
> > enable this over wc mapping (no stalls for a contended, limited
> > resource) + detiling. (Just note that for detiling Y to WC you need to
> > use a temporary cacheable page, or rearrange the code to make sure the
> > reads/writes are in 64 byte chunks.) 
> 
> Again, I can run any tests you'd like to get numbers from,
> just give me a list.

gem_gtt_speed $obj_size will tell us the relative performance of
untiled/tiled GTT access vs WC/WB.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list