[Intel-gfx] [Mesa-dev] [libdrm PATCH] intel: Make unsynchronized GTT mappings work on systems with snooping.

Eero Tamminen eero.t.tamminen at intel.com
Tue Mar 14 10:48:13 UTC 2017


Hi,

On 11.03.2017 03:14, Kenneth Graunke wrote:
> On systems without LLC, drm_intel_gem_bo_map_unsynchronized() has
> had the surprising behavior of doing a synchronized GTT mapping.
> This is obviously not what the user of the API wanted.
>
> Eric left a comment indicating a valid concern: if the CPU and GPU
> caches are incoherent, we don't keep track of where the user last
> mapped the buffer, and what caches might contain relevant data.
>
> Modern Atom systems still don't have LLC, but they do offer snooping,
> which effectively makes the caches coherent.  The kernel appears to
> set up the PTE/PPAT to enable snooping for everything where the cache
> level is not I915_CACHE_NONE.  As far as I know, only scanout buffers
> are marked as uncached.
>
> Any buffers used by scanout should be flagged as non-reusable with
> drm_intel_bo_disable_reuse(), prime export, or flink.  So, we can
> assume that any reusable buffer should be snooped.
>
> This patch enables unsynchronized mappings for reusable buffers
> on all Gen6+ hardware (which have either LLC or snooping).
>
> On Broxton, this improves the performance of Unigine Valley 1.0
> on Low settings at 1280x720 by about 45%, and Unigine Heaven 4.0
> (same settings) by about 53%.

I tested it with our normal set of benchmarks.

Using FullHD resolution and "high" quality settings, on Broxton, Valley 
improved by ~11% and Heaven (with tessellation enabled) by 2-3%.

CarChase seemed to improve also by several percents, but everything else 
was within normal variation.

I'll check BYT & BSW too.


	- Eero


> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: mesa-dev at lists.freedesktop.org
> ---
>  intel/intel_bufmgr_gem.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> It looks like Mesa and Beignet are the only callers of this function
> (SNA and Anvil don't use libdrm, UXA and vaapi don't use this function.)
>
> This passed our full barrage of Piglit/dEQP/GLCTS/GLESCTS testing.
> gnome-shell still works, as does Unigine, and GLBenchmark.
>
> I haven't tested any OpenCL workloads.
>
> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
> index e260f2dc..f53f1fcc 100644
> --- a/intel/intel_bufmgr_gem.c
> +++ b/intel/intel_bufmgr_gem.c
> @@ -1630,9 +1630,7 @@ int
>  drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
>  {
>  	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> -#ifdef HAVE_VALGRIND
>  	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> -#endif
>  	int ret;
>
>  	/* If the CPU cache isn't coherent with the GTT, then use a
> @@ -1641,8 +1639,12 @@ drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
>  	 * terms of drm_intel_bo_map vs drm_intel_gem_bo_map_gtt, so
>  	 * we would potentially corrupt the buffer even when the user
>  	 * does reasonable things.
> +	 *
> +	 * The caches are coherent on LLC platforms or snooping is enabled
> +	 * for the BO.  The kernel enables snooping for non-scanout (reusable)
> +	 * buffers on modern non-LLC systems.
>  	 */
> -	if (!bufmgr_gem->has_llc)
> +	if (bufmgr_gem->gen < 6 || !bo_gem->reusable)
>  		return drm_intel_gem_bo_map_gtt(bo);
>
>  	pthread_mutex_lock(&bufmgr_gem->lock);
>



More information about the Intel-gfx mailing list