[Intel-xe] [RFC PATCH v2 22/23] drm/i915: Handle dma fences in dirtyfb callback

Ville Syrjälä ville.syrjala at linux.intel.com
Thu Jul 13 20:08:04 UTC 2023


On Wed, May 10, 2023 at 03:11:51PM +0300, Jouni Högander wrote:
> Take into account dma fences in dirtyfb callback. If there is no
> unsignaled dma fences perform flush immediately. If there are
> unsignaled dma fences perform invalidate and add callback which will
> queue flush when the fence gets signaled.
> 
> Signed-off-by: Jouni Högander <jouni.hogander at intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_fb.c | 55 +++++++++++++++++++++++--
>  1 file changed, 52 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
> index fa4464d433b7..fc325f2299a4 100644
> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> @@ -8,6 +8,9 @@
>  #include <drm/drm_framebuffer.h>
>  #include <drm/drm_modeset_helper.h>
>  
> +#include <linux/dma-fence.h>
> +#include <linux/dma-resv.h>
> +
>  #include "i915_drv.h"
>  #include "intel_display.h"
>  #include "intel_display_types.h"
> @@ -1888,6 +1891,20 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb,
>  }
>  
>  #ifdef I915
> +struct frontbuffer_fence_cb {
> +	struct dma_fence_cb base;
> +	struct intel_frontbuffer *front;
> +};
> +
> +static void intel_user_framebuffer_fence_wake(struct dma_fence *dma,
> +					      struct dma_fence_cb *data)
> +{
> +	struct frontbuffer_fence_cb *cb = container_of(data, typeof(*cb), base);
> +
> +	intel_frontbuffer_queue_flush(cb->front);
> +	kfree(cb);
> +}
> +
>  static int intel_user_framebuffer_dirty(struct drm_framebuffer *fb,
>  					struct drm_file *file,
>  					unsigned int flags, unsigned int color,
> @@ -1895,11 +1912,43 @@ static int intel_user_framebuffer_dirty(struct drm_framebuffer *fb,
>  					unsigned int num_clips)
>  {
>  	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
> +	struct intel_frontbuffer *front = to_intel_frontbuffer(fb);
> +	struct dma_resv_iter cursor;
> +	struct dma_fence *fence;
> +	int ret;
> +
> +	if (dma_resv_test_signaled(intel_bo_to_drm_bo(obj).resv, dma_resv_usage_rw(false))) {
> +		intel_bo_flush_if_display(obj);
> +		intel_frontbuffer_flush(front, ORIGIN_DIRTYFB);
> +		return 0;
> +	}
>  
> -	intel_bo_flush_if_display(obj);
> -	intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_DIRTYFB);
> +	intel_frontbuffer_invalidate(front, ORIGIN_DIRTYFB);
>  
> -	return 0;
> +	dma_resv_iter_begin(&cursor, intel_bo_to_drm_bo(obj).resv,
> +			    dma_resv_usage_rw(false));
> +	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> +		struct frontbuffer_fence_cb *cb =
> +			kmalloc(sizeof(struct frontbuffer_fence_cb), GFP_KERNEL);
> +		if (!cb) {
> +			ret = -ENOMEM;
> +			break;
> +		}
> +		cb->front = front;
> +
> +		ret = dma_fence_add_callback(fence, &cb->base,
> +					     intel_user_framebuffer_fence_wake);
> +		if (ret) {
> +			intel_user_framebuffer_fence_wake(fence, &cb->base);
> +			if (ret == -ENOENT)
> +				ret = 0;
> +			else
> +				break;
> +		}
> +	}
> +	dma_resv_iter_end(&cursor);

AFAICS we could use dma_resv_get_singleton() here to get just a
single callback once all the included fences have signalled. It
might also reduce the amount of kmallocs() a bit, though
dma_resv_get_singleton() does seem to end up doing multiple
allocations as well, but perhaps it could be optimized further.

The other thing dma_resv_get_singleton() does is is reference
counting of the fences. But I'm not sure that's needed here.
Ie. I'm not sure what the lifetime rules are.


I was also pondering what kind of scenarios we might hit here that might
be a bit problematic. This is what I came up with:

* scenario 1:

 flip(PLANE A):
  -> FB A.bits=PLANE A
 set fence(FB A):
  -> FB A.fence = fence 1
 dirtyfb(FB A):
  -> fence 1 !signalled -> invalidate FB A.bits==PLANE A
  -> fence 1 queue cb
 flip(PLANE A):
  -> FB A.bits = 0
  -> FB B.bits = PLANE A
 fence 1 cb -> flush FB A.bits=0

 In the end tracking is left in invalidated state, at least for
 FBC AFAICS. Possible fix would be to clear FBC busy_bits on flip [1]?
 DRRS is fine I think since every flip already clears busy_bits.
 Not sure what PSR does.


[1]
@@ -1299,11 +1299,9 @@ static void __intel_fbc_post_update(struct intel_fbc *fbc)
        lockdep_assert_held(&fbc->lock);

        fbc->flip_pending = false;
+       fbc->busy_bits = 0;

-       if (!fbc->busy_bits)
-               intel_fbc_activate(fbc);
-       else
-               intel_fbc_deactivate(fbc, "frontbuffer write");
+       intel_fbc_activate(fbc);
 }


* scenario 2:

 flip(PLANE A):
  -> FB A.bits=PLANE A
 set fence(FB A):
  -> FB A.fence = fence 1
 dirtyfb(FB A):
  -> fence 1 !signalled -> invalidate FB A.bits==PLANE A
  -> fence 1 queue cb
 set fence(FB A):
  -> FB A.fence = fence 2
 dirtyfb(FB A):
  -> fence 2 !signalled -> invalidate FB A.bits==PLANE A
  -> fence 2 queue cb
 fence 1 cb -> flush FB A.bits==PLANE A
  -> frontbuffer tracking flushed before fence 2 has signalled
 ...
 fence 2 cb -> flush FB A.bits==PLANE A

 Perhaps we should keep track of how many fences are actually pending,
 and only do the frontbuffer flush when the count drops to zero?
 OTOH the final flush should still guarantee some kind of correctness
 in the end, so not sure this is really a big problem.

> +
> +	return ret;
>  }
>  #endif
>  
> -- 
> 2.34.1

-- 
Ville Syrjälä
Intel


More information about the Intel-xe mailing list