[Intel-gfx] [PATCH libdrm] intel: Use CPU mmap for unsynchronized map with linear buffers

Thu Sep 17 07:19:02 PDT 2015

From: Ville Syrjälä <ville.syrjala at linux.intel.com>

On LLC platforms there's no need to use GTT mmap for unsynchronized
maps if the object isn't tiled. So switch to using CPU mmap for linar
objects. This avoids having to use the GTT for GL buffer objects
entirely, and thus we can ignore the GTT mappable size limitation.
For tiled objects we still want the hardware to do the (de)tiling so
keep using GTT for such objects.

The display engine is not coherent even on LLC platforms, so this won't
work too well if we mix scanout and unsynchronized maps of linear bos.
Actually, that would only be a problem for an already uncached object,
otherwise it will get clflushed anyway when being made UC/WC prior to
scanout. The alreday UC object case could be handled by either
clflushing straight from userspace, or we could add a new ioctl to
clflush or mark the object as cache_dirty so that it will get
clflushed in the future just prior to scanout. I started to think
that a small nop pwrite would have the desired effect, but in fact it
would only flush the cachelines it touches so wouldn't actually work
I doubt we want to pwrite the entire object just to get it clflushed.

This fixes Ilias's arb_texture_buffer_object-max-size piglit test
on LLC platforms.

Cc: Ilia Mirkin <imirkin at alum.mit.edu>
Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
---
 intel/intel_bufmgr_gem.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 63122d0..5e8335a 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -1337,11 +1337,10 @@ static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
-static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
+static int map_cpu(drm_intel_bo *bo)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
 	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
-	struct drm_i915_gem_set_domain set_domain;
 	int ret;
 
 	if (bo_gem->is_userptr) {
@@ -1350,8 +1349,6 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 		return 0;
 	}
 
-	pthread_mutex_lock(&bufmgr_gem->lock);
-
 	if (bo_gem->map_count++ == 0)
 		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
 
@@ -1384,6 +1381,24 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 	    bo_gem->mem_virtual);
 	bo->virtual = bo_gem->mem_virtual;
 
+	return 0;
+}
+
+static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_cpu(bo);
+	if (ret || bo_gem->is_userptr) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
 	memclear(set_domain);
 	set_domain.handle = bo_gem->gem_handle;
 	set_domain.read_domains = I915_GEM_DOMAIN_CPU;
@@ -1536,9 +1551,7 @@ int
 drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
-#ifdef HAVE_VALGRIND
 	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
-#endif
 	int ret;
 
 	/* If the CPU cache isn't coherent with the GTT, then use a
@@ -1553,7 +1566,10 @@ drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
 
 	pthread_mutex_lock(&bufmgr_gem->lock);
 
-	ret = map_gtt(bo);
+	if (bo_gem->tiling_mode == I915_TILING_NONE)
+		ret = map_cpu(bo);
+	else
+		ret = map_gtt(bo);
 	if (ret == 0) {
 		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
 		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size));
-- 
2.4.6