[Intel-gfx] [PATCH v7 6/8] drm/i915: Interrupt driven fences
Maarten Lankhorst
maarten.lankhorst at linux.intel.com
Thu Apr 21 07:38:06 UTC 2016
Op 20-04-16 om 19:09 schreef John.C.Harrison at Intel.com:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke
> me' list when a new seqno pops out and signals any matching
> fence/request. The fence is then removed from the list so the entire
> request stack does not need to be scanned every time. Note that the
> fence is added to the list before the commands to generate the seqno
> interrupt are added to the ring. Thus the sequence is guaranteed to be
> race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 8 +
> drivers/gpu/drm/i915/i915_gem.c | 249 +++++++++++++++++++++++++++++---
> drivers/gpu/drm/i915/i915_irq.c | 2 +
> drivers/gpu/drm/i915/intel_lrc.c | 2 +
> drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +
> drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +
> 6 files changed, 241 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 81324ba..9519b11 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2255,7 +2255,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
> struct drm_i915_gem_request {
> /** Underlying object for implementing the signal/wait stuff. */
> struct fence fence;
> + struct list_head signal_link;
> + struct list_head unsignal_link;
> struct list_head delayed_free_link;
> + bool cancelled;
> + bool irq_enabled;
> + bool signal_requested;
>
> /** On Which ring this request was generated */
> struct drm_i915_private *i915;
> @@ -2341,6 +2346,9 @@ struct drm_i915_gem_request * __must_check
> i915_gem_request_alloc(struct intel_engine_cs *engine,
> struct intel_context *ctx);
> void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> + bool fence_locked);
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>
> int i915_create_fence_timeline(struct drm_device *dev,
> struct intel_context *ctx,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9071058..9b1de5f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -40,6 +40,8 @@
>
> #define RQ_BUG_ON(expr)
>
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +
> static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
> static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
> static void
> @@ -1253,9 +1255,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
> struct drm_device *dev = engine->dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> - const bool irq_test_in_progress =
> - ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
> int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> + uint32_t seqno;
> DEFINE_WAIT(wait);
> unsigned long timeout_expire;
> s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1263,9 +1264,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>
> WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>
> - if (list_empty(&req->list))
> - return 0;
> -
> if (i915_gem_request_completed(req))
> return 0;
>
> @@ -1291,15 +1289,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> trace_i915_gem_request_wait_begin(req);
>
> /* Optimistic spin for the next jiffie before touching IRQs */
> - ret = __i915_spin_request(req, state);
> - if (ret == 0)
> - goto out;
> -
> - if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> - ret = -ENODEV;
> - goto out;
> + if (req->seqno) {
> + ret = __i915_spin_request(req, state);
> + if (ret == 0)
> + goto out;
> }
>
> + /*
> + * Enable interrupt completion of the request.
> + */
> + fence_enable_sw_signaling(&req->fence);
> +
> for (;;) {
> struct timer_list timer;
>
> @@ -1321,6 +1321,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> break;
> }
>
> + if (req->seqno) {
> + /*
> + * There is quite a lot of latency in the user interrupt
> + * path. So do an explicit seqno check and potentially
> + * remove all that delay.
> + */
> + if (req->engine->irq_seqno_barrier)
> + req->engine->irq_seqno_barrier(req->engine);
> + seqno = engine->get_seqno(engine);
You're using this barrier + get_seqno pattern multiple times, maybe write a helper function for this?
> + if (i915_seqno_passed(seqno, req->seqno)) {
> + ret = 0;
> + break;
> + }
> + }
> +
> if (signal_pending_state(state, current)) {
> ret = -ERESTARTSYS;
> break;
> @@ -1347,14 +1362,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> destroy_timer_on_stack(&timer);
> }
> }
> - if (!irq_test_in_progress)
> - engine->irq_put(engine);
>
> finish_wait(&engine->irq_queue, &wait);
>
> out:
> trace_i915_gem_request_wait_end(req);
>
> + if ((ret == 0) && (req->seqno)) {
> + if (req->engine->irq_seqno_barrier)
> + req->engine->irq_seqno_barrier(req->engine);
> + seqno = engine->get_seqno(engine);
> + if (i915_seqno_passed(seqno, req->seqno) &&
> + !i915_gem_request_completed(req)) {
> + /*
> + * Make sure the request is marked as completed before
> + * returning. NB: Need to acquire the spinlock around
> + * the whole call to avoid a race condition with the
> + * interrupt handler is running concurrently and could
> + * cause this invocation to early exit even though the
> + * request has not actually been fully processed yet.
> + */
> + spin_lock_irq(&req->engine->fence_lock);
> + i915_gem_request_notify(req->engine, true);
> + spin_unlock_irq(&req->engine->fence_lock);
> + }
> + }
> +
> if (timeout) {
> s64 tres = *timeout - (ktime_get_raw_ns() - before);
>
> @@ -1432,6 +1465,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> list_del_init(&request->list);
> i915_gem_request_remove_from_client(request);
>
> + /*
> + * In case the request is still in the signal pending list,
> + * e.g. due to being cancelled by TDR, preemption, etc.
> + */
> + if (!list_empty(&request->signal_link)) {
> + /*
> + * The request must be marked as cancelled and the underlying
> + * fence as failed. NB: There is no explicit fence fail API,
> + * there is only a manual poke and signal.
> + */
> + request->cancelled = true;
> + /* How to propagate to any associated sync_fence??? */
> + request->fence.status = -EIO;
> + fence_signal_locked(&request->fence);
> + }
>
Can this still happen with
commit aa9b78104fe3210758fa9e6c644e9a108d371e8b
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date: Wed Apr 13 17:35:15 2016 +0100
drm/i915: Late request cancellations are harmful
?
More information about the Intel-gfx
mailing list