[Mesa-dev] [libdrm PATCH] intel: Make unsynchronized GTT mappings work on systems with snooping.
Eero Tamminen
eero.t.tamminen at intel.com
Tue Mar 14 10:48:13 UTC 2017
Hi,
On 11.03.2017 03:14, Kenneth Graunke wrote:
> On systems without LLC, drm_intel_gem_bo_map_unsynchronized() has
> had the surprising behavior of doing a synchronized GTT mapping.
> This is obviously not what the user of the API wanted.
>
> Eric left a comment indicating a valid concern: if the CPU and GPU
> caches are incoherent, we don't keep track of where the user last
> mapped the buffer, and what caches might contain relevant data.
>
> Modern Atom systems still don't have LLC, but they do offer snooping,
> which effectively makes the caches coherent. The kernel appears to
> set up the PTE/PPAT to enable snooping for everything where the cache
> level is not I915_CACHE_NONE. As far as I know, only scanout buffers
> are marked as uncached.
>
> Any buffers used by scanout should be flagged as non-reusable with
> drm_intel_bo_disable_reuse(), prime export, or flink. So, we can
> assume that any reusable buffer should be snooped.
>
> This patch enables unsynchronized mappings for reusable buffers
> on all Gen6+ hardware (which have either LLC or snooping).
>
> On Broxton, this improves the performance of Unigine Valley 1.0
> on Low settings at 1280x720 by about 45%, and Unigine Heaven 4.0
> (same settings) by about 53%.
I tested it with our normal set of benchmarks.
Using FullHD resolution and "high" quality settings, on Broxton, Valley
improved by ~11% and Heaven (with tessellation enabled) by 2-3%.
CarChase seemed to improve also by several percents, but everything else
was within normal variation.
I'll check BYT & BSW too.
- Eero
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: mesa-dev at lists.freedesktop.org
> ---
> intel/intel_bufmgr_gem.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> It looks like Mesa and Beignet are the only callers of this function
> (SNA and Anvil don't use libdrm, UXA and vaapi don't use this function.)
>
> This passed our full barrage of Piglit/dEQP/GLCTS/GLESCTS testing.
> gnome-shell still works, as does Unigine, and GLBenchmark.
>
> I haven't tested any OpenCL workloads.
>
> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
> index e260f2dc..f53f1fcc 100644
> --- a/intel/intel_bufmgr_gem.c
> +++ b/intel/intel_bufmgr_gem.c
> @@ -1630,9 +1630,7 @@ int
> drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
> {
> drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> -#ifdef HAVE_VALGRIND
> drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> -#endif
> int ret;
>
> /* If the CPU cache isn't coherent with the GTT, then use a
> @@ -1641,8 +1639,12 @@ drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
> * terms of drm_intel_bo_map vs drm_intel_gem_bo_map_gtt, so
> * we would potentially corrupt the buffer even when the user
> * does reasonable things.
> + *
> + * The caches are coherent on LLC platforms or snooping is enabled
> + * for the BO. The kernel enables snooping for non-scanout (reusable)
> + * buffers on modern non-LLC systems.
> */
> - if (!bufmgr_gem->has_llc)
> + if (bufmgr_gem->gen < 6 || !bo_gem->reusable)
> return drm_intel_gem_bo_map_gtt(bo);
>
> pthread_mutex_lock(&bufmgr_gem->lock);
>
More information about the mesa-dev
mailing list