[Intel-gfx] [libdrm PATCH] intel: Make unsynchronized GTT mappings work on systems with snooping.

Chris Wilson chris at chris-wilson.co.uk
Sun Mar 12 13:21:12 UTC 2017


On Fri, Mar 10, 2017 at 05:14:32PM -0800, Kenneth Graunke wrote:
> On systems without LLC, drm_intel_gem_bo_map_unsynchronized() has
> had the surprising behavior of doing a synchronized GTT mapping.
> This is obviously not what the user of the API wanted.
> 
> Eric left a comment indicating a valid concern: if the CPU and GPU
> caches are incoherent, we don't keep track of where the user last
> mapped the buffer, and what caches might contain relevant data.

Note this is an issue in libdrm_intel not tracking the cache domain
transitions. Even just a switch between cpu and coherent would solve the
majority of that - the caveat being shared bo where the tracking is
incomplete.
 
> Modern Atom systems still don't have LLC, but they do offer snooping,
> which effectively makes the caches coherent.  The kernel appears to
> set up the PTE/PPAT to enable snooping for everything where the cache
> level is not I915_CACHE_NONE.  As far as I know, only scanout buffers
> are marked as uncached.

Byt, bsw beg to differ. I don't have a bxt to know the results of the
igt/kernel tests.
 
> Any buffers used by scanout should be flagged as non-reusable with
> drm_intel_bo_disable_reuse(), prime export, or flink.  So, we can
> assume that any reusable buffer should be snooped.

Not really, there is no reason why scanout buffers can't be reused.
 
> This patch enables unsynchronized mappings for reusable buffers
> on all Gen6+ hardware (which have either LLC or snooping).
> 
> On Broxton, this improves the performance of Unigine Valley 1.0
> on Low settings at 1280x720 by about 45%, and Unigine Heaven 4.0
> (same settings) by about 53%.

Does anyone have figures for gtt performance on bxt - does it cover over
the same performance penalty from earler atoms? Basically why bother to
enable this over wc mapping (no stalls for a contended, limited
resource) + detiling. (Just note that for detiling Y to WC you need to
use a temporary cacheable page, or rearrange the code to make sure the
reads/writes are in 64 byte chunks.) 

> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: mesa-dev at lists.freedesktop.org
> ---
>  intel/intel_bufmgr_gem.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> It looks like Mesa and Beignet are the only callers of this function
> (SNA and Anvil don't use libdrm, UXA and vaapi don't use this function.)
> 
> This passed our full barrage of Piglit/dEQP/GLCTS/GLESCTS testing.
> gnome-shell still works, as does Unigine, and GLBenchmark.
> 
> I haven't tested any OpenCL workloads.
> 
> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
> index e260f2dc..f53f1fcc 100644
> --- a/intel/intel_bufmgr_gem.c
> +++ b/intel/intel_bufmgr_gem.c
> @@ -1630,9 +1630,7 @@ int
>  drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
>  {
>  	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> -#ifdef HAVE_VALGRIND
>  	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> -#endif
>  	int ret;
>  
>  	/* If the CPU cache isn't coherent with the GTT, then use a
> @@ -1641,8 +1639,12 @@ drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
>  	 * terms of drm_intel_bo_map vs drm_intel_gem_bo_map_gtt, so
>  	 * we would potentially corrupt the buffer even when the user
>  	 * does reasonable things.
> +	 *
> +	 * The caches are coherent on LLC platforms or snooping is enabled
> +	 * for the BO.  The kernel enables snooping for non-scanout (reusable)
> +	 * buffers on modern non-LLC systems.

gen >= 9; the snoop was reinvented

Not enabled by default on !llc. By default object and PTE are set to
uncached, and mocs default is to follow PTE.

I would just do the unsync map and leave it to the caller to ensure it
is used correctly. Or just use the raw interfaces that leave the domain
tracking to the caller.

>  	 */
> -	if (!bufmgr_gem->has_llc)
> +	if (bufmgr_gem->gen < 6 || !bo_gem->reusable)
>  		return drm_intel_gem_bo_map_gtt(bo);
>  
>  	pthread_mutex_lock(&bufmgr_gem->lock);
> -- 
> 2.12.0
> 

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list