[Intel-gfx] [PATCH] drm/i915/gem: Don't try to map and fence large scanout buffers

Ville Syrjälä ville.syrjala at linux.intel.com
Thu Oct 28 12:53:11 UTC 2021


On Thu, Oct 28, 2021 at 01:04:23AM -0700, Vivek Kasireddy wrote:
> On platforms capable of allowing 8K (7680 x 4320) modes, pinning 2 or
> more framebuffers/scanout buffers results in only one that is mappable/
> fenceable. Therefore, pageflipping between these 2 FBs where only one
> is mappable/fenceable creates latencies large enough to miss alternate
> vblanks thereby producing less optimal framerate.
> 
> This mainly happens because when i915_gem_object_pin_to_display_plane()
> is called to pin one of the FB objs, the associated vma is identified
> as misplaced and therefore i915_vma_unbind() is called which unbinds and
> evicts it. This misplaced vma gets subseqently pinned only when
> i915_gem_object_ggtt_pin_ww() is called without the mappable flag. This
> results in a latency of ~10ms and happens every other vblank/repaint cycle.
> 
> Testcase:
> Running Weston and weston-simple-egl on an Alderlake_S (ADLS) platform
> with a 8K at 60 mode results in only ~40 FPS. Since upstream Weston submits
> a frame ~7ms before the next vblank, the latencies seen between atomic
> commit and flip event are 7, 24 (7 + 16.66), 7, 24..... suggesting that
> it misses the vblank every other frame.
> 
> Here is the ftrace snippet that shows the source of the ~10ms latency:
>               i915_gem_object_pin_to_display_plane() {
> 0.102 us   |    i915_gem_object_set_cache_level();
>                 i915_gem_object_ggtt_pin_ww() {
> 0.390 us   |      i915_vma_instance();
> 0.178 us   |      i915_vma_misplaced();
>                   i915_vma_unbind() {
>                   __i915_active_wait() {
> 0.082 us   |        i915_active_acquire_if_busy();
> 0.475 us   |      }
>                   intel_runtime_pm_get() {
> 0.087 us   |        intel_runtime_pm_acquire();
> 0.259 us   |      }
>                   __i915_active_wait() {
> 0.085 us   |        i915_active_acquire_if_busy();
> 0.240 us   |      }
>                   __i915_vma_evict() {
>                     ggtt_unbind_vma() {
>                       gen8_ggtt_clear_range() {
> 10507.255 us |        }
> 10507.689 us |      }
> 10508.516 us |   }
> 
> v2: Instead of using bigjoiner checks, determine whether a scanout
>     buffer is too big by checking to see if it is possible to map
>     two of them into the ggtt.
> 
> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Cc: Manasi Navare <manasi.d.navare at intel.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 48 ++++++++++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_vma.c |  2 +-
>  2 files changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 981e383d1a5d..0050c7e4bb51 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -866,6 +866,44 @@ static void discard_ggtt_vma(struct i915_vma *vma)
>  	spin_unlock(&obj->vma.lock);
>  }
>  
> +static bool i915_gem_obj_too_big(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct i915_ggtt *ggtt = &i915->ggtt;
> +	struct drm_mm_node *hole;
> +	u64 hole_start, hole_end;
> +	u64 fence_size;
> +
> +	/*
> +	 * If the required space is larger than the available
> +	 * aperture, we will not able to find a slot for the
> +	 * object and unbinding the object now will be in
> +	 * vain. Worse, doing so may cause us to ping-pong
> +	 * the object in and out of the Global GTT and
> +	 * waste a lot of cycles under the mutex.
> +	 */
> +	if (obj->base.size > ggtt->mappable_end)
> +		return true;
> +
> +	fence_size = i915_gem_fence_size(i915, obj->base.size,
> +					 i915_gem_object_get_tiling(obj),
> +					 i915_gem_object_get_stride(obj));
> +
> +	/*
> +	 * Assuming this object is a large scanout buffer, we try to find
> +	 * out if there is room to map at-least two of them. There could
> +	 * be space available to map one but to be consistent, we try to
> +	 * avoid mapping/fencing any of them.
> +	 */
> +	drm_mm_for_each_hole(hole, &ggtt->vm.mm, hole_start, hole_end) {
> +		if (hole_end - hole_start > 2 * fence_size &&
> +		    hole_start + 2 * fence_size < ggtt->mappable_end)
> +			return false;
> +	}

Looking for a hole twice the size seems a bit weird. Would make more sense
to check how many of these vmas could we fit in to any of the holes.
This also doesn't seem to handle any alignment constraints.

> +
> +	return true;
> +}
> +
>  struct i915_vma *
>  i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj,
>  			    struct i915_gem_ww_ctx *ww,
> @@ -879,15 +917,7 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj,
>  
>  	if (flags & PIN_MAPPABLE &&
>  	    (!view || view->type == I915_GGTT_VIEW_NORMAL)) {
> -		/*
> -		 * If the required space is larger than the available
> -		 * aperture, we will not able to find a slot for the
> -		 * object and unbinding the object now will be in
> -		 * vain. Worse, doing so may cause us to ping-pong
> -		 * the object in and out of the Global GTT and
> -		 * waste a lot of cycles under the mutex.
> -		 */
> -		if (obj->base.size > ggtt->mappable_end)
> +		if (i915_gem_obj_too_big(obj))

Doing this uncoditionally seems wrong. Some platforms always use
PIN_MAPPABLE.

>  			return ERR_PTR(-E2BIG);
>  
>  		/*

There is already code around these parts that tries to limit this
ping-pong:
                /*                                                                                             
                 * If NONBLOCK is set the caller is optimistically                                             
                 * trying to cache the full object within the mappable                                         
                 * aperture, and *must* have a fallback in place for                                           
                 * situations where we cannot bind the object. We                                              
                 * can be a little more lax here and use the fallback                                          
                 * more often to avoid costly migrations of ourselves                                          
                 * and other objects within the aperture.                                                      
                 *                                                                                             
                 * Half-the-aperture is used as a simple heuristic.                                            
                 * More interesting would to do search for a free                                              
                 * block prior to making the commitment to unbind.                                             
                 * That caters for the self-harm case, and with a                                              
                 * little more heuristics (e.g. NOFAULT, NOEVICT)                                              
                 * we could try to minimise harm to others.                                                    
                 */
	        if (flags & PIN_NONBLOCK &&
                    obj->base.size > ggtt->mappable_end / 2)
	                return ERR_PTR(-ENOSPC);

Looks like you're pretty much trying to implement what tha comment says.

> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 90546fa58fc1..551644dbfa8a 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -977,7 +977,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
>  		if (err)
>  			goto err_active;
>  
> -		if (i915_is_ggtt(vma->vm))
> +		if (i915_is_ggtt(vma->vm) && flags & PIN_MAPPABLE)
>  			__i915_vma_set_map_and_fenceable(vma);
>  	}
>  
> -- 
> 2.31.1

-- 
Ville Syrjälä
Intel


More information about the Intel-gfx mailing list