[Intel-gfx] [PATCH v8 4/6] drm/i915: Interrupt driven fences
Maarten Lankhorst
maarten.lankhorst at linux.intel.com
Thu May 19 09:24:17 UTC 2016
Op 12-05-16 om 23:06 schreef John.C.Harrison at Intel.com:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke
> me' list when a new seqno pops out and signals any matching
> fence/request. The fence is then removed from the list so the entire
> request stack does not need to be scanned every time. Note that the
> fence is added to the list before the commands to generate the seqno
> interrupt are added to the ring. Thus the sequence is guaranteed to be
> race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> v8: Updated to newer nightly - changes to request clean up code mean
> non of the deferred free mess is needed any more.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 7 +
> drivers/gpu/drm/i915/i915_gem.c | 235 +++++++++++++++++++++++++++++---
> drivers/gpu/drm/i915/i915_irq.c | 2 +
> drivers/gpu/drm/i915/intel_lrc.c | 1 +
> drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
> drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
> 6 files changed, 225 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 433d99a..3adac59 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2278,6 +2278,10 @@ struct drm_i915_gem_request {
> */
> struct fence fence;
> struct rcu_head rcu_head;
> + struct list_head signal_link;
> + bool cancelled;
> + bool irq_enabled;
> + bool signal_requested;
>
> /** On Which ring this request was generated */
> struct drm_i915_private *i915;
> @@ -2379,6 +2383,9 @@ struct drm_i915_gem_request {
> struct drm_i915_gem_request * __must_check
> i915_gem_request_alloc(struct intel_engine_cs *engine,
> struct intel_context *ctx);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> + bool fence_locked);
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>
> static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
> {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 958edcb..6669508 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -39,6 +39,8 @@
> #include <linux/pci.h>
> #include <linux/dma-buf.h>
>
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +
> static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
> static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
> static void
> @@ -1238,9 +1240,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
> struct drm_device *dev = engine->dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> - const bool irq_test_in_progress =
> - ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
> int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> + uint32_t seqno;
> DEFINE_WAIT(wait);
> unsigned long timeout_expire;
> s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1248,9 +1249,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>
> WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>
> - if (list_empty(&req->list))
> - return 0;
> -
> if (i915_gem_request_completed(req))
> return 0;
>
> @@ -1276,15 +1274,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> trace_i915_gem_request_wait_begin(req);
>
> /* Optimistic spin for the next jiffie before touching IRQs */
> - ret = __i915_spin_request(req, state);
> - if (ret == 0)
> - goto out;
> -
> - if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> - ret = -ENODEV;
> - goto out;
> + if (req->seqno) {
> + ret = __i915_spin_request(req, state);
> + if (ret == 0)
> + goto out;
> }
>
> + /*
> + * Enable interrupt completion of the request.
> + */
> + fence_enable_sw_signaling(&req->fence);
> +
> for (;;) {
> struct timer_list timer;
>
> @@ -1307,6 +1307,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> break;
> }
>
> + if (req->seqno) {
> + /*
> + * There is quite a lot of latency in the user interrupt
> + * path. So do an explicit seqno check and potentially
> + * remove all that delay.
> + */
> + if (req->engine->irq_seqno_barrier)
> + req->engine->irq_seqno_barrier(req->engine);
> + seqno = engine->get_seqno(engine);
> + if (i915_seqno_passed(seqno, req->seqno)) {
> + ret = 0;
> + break;
> + }
> + }
> +
> if (signal_pending_state(state, current)) {
> ret = -ERESTARTSYS;
> break;
> @@ -1333,14 +1348,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> destroy_timer_on_stack(&timer);
> }
> }
> - if (!irq_test_in_progress)
> - engine->irq_put(engine);
>
> finish_wait(&engine->irq_queue, &wait);
>
> out:
> trace_i915_gem_request_wait_end(req);
>
> + if ((ret == 0) && (req->seqno)) {
> + if (req->engine->irq_seqno_barrier)
> + req->engine->irq_seqno_barrier(req->engine);
> + seqno = engine->get_seqno(engine);
> + if (i915_seqno_passed(seqno, req->seqno) &&
> + !i915_gem_request_completed(req)) {
> + /*
> + * Make sure the request is marked as completed before
> + * returning. NB: Need to acquire the spinlock around
> + * the whole call to avoid a race condition with the
> + * interrupt handler is running concurrently and could
> + * cause this invocation to early exit even though the
> + * request has not actually been fully processed yet.
> + */
> + spin_lock_irq(&req->engine->fence_lock);
> + i915_gem_request_notify(req->engine, true);
> + spin_unlock_irq(&req->engine->fence_lock);
> + }
> + }
> +
> if (timeout) {
> s64 tres = *timeout - (ktime_get_raw_ns() - before);
>
> @@ -1406,6 +1439,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> {
> trace_i915_gem_request_retire(request);
>
> + if (request->irq_enabled) {
> + request->engine->irq_put(request->engine);
> + request->irq_enabled = false;
> + }
> +
> /* We know the GPU must have read the request to have
> * sent us the seqno + interrupt, so use the position
> * of tail of the request to update the last known position
> @@ -1419,6 +1457,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> list_del_init(&request->list);
> i915_gem_request_remove_from_client(request);
>
> + /*
> + * In case the request is still in the signal pending list,
> + * e.g. due to being cancelled by TDR, preemption, etc.
> + */
> + if (!list_empty(&request->signal_link)) {
> + /*
> + * The request must be marked as cancelled and the underlying
> + * fence as failed. NB: There is no explicit fence fail API,
> + * there is only a manual poke and signal.
> + */
> + request->cancelled = true;
> + /* How to propagate to any associated sync_fence??? */
> + request->fence.status = -EIO;
> + fence_signal_locked(&request->fence);
> + }
> +
> if (request->previous_context) {
> if (i915.enable_execlists)
> intel_lr_context_unpin(request->previous_context,
> @@ -2650,6 +2704,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
> */
> request->postfix = intel_ring_get_tail(ringbuf);
>
> + /*
> + * Add the fence to the pending list before emitting the commands to
> + * generate a seqno notification interrupt.
> + */
> + i915_gem_request_submit(request);
> +
> if (i915.enable_execlists)
> ret = engine->emit_request(request);
> else {
> @@ -2735,25 +2795,141 @@ static void i915_gem_request_free(struct fence *req_fence)
> struct drm_i915_gem_request *req;
>
> req = container_of(req_fence, typeof(*req), fence);
> +
> + WARN_ON(req->irq_enabled);
> +
> call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
> }
>
> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +/*
> + * The request is about to be submitted to the hardware so add the fence to
> + * the list of signalable fences.
> + *
> + * NB: This does not necessarily enable interrupts yet. That only occurs on
> + * demand when the request is actually waited on. However, adding it to the
> + * list early ensures that there is no race condition where the interrupt
> + * could pop out prematurely and thus be completely lost. The race is merely
> + * that the interrupt must be manually checked for after being enabled.
> + */
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
> {
> - /* Interrupt driven fences are not implemented yet.*/
> - WARN(true, "This should not be called!");
> - return true;
> + /*
> + * Always enable signal processing for the request's fence object
> + * before that request is submitted to the hardware. Thus there is no
> + * race condition whereby the interrupt could pop out before the
> + * request has been added to the signal list. Hence no need to check
> + * for completion, undo the list add and return false.
> + */
> + i915_gem_request_reference(req);
> + spin_lock_irq(&req->engine->fence_lock);
> + WARN_ON(!list_empty(&req->signal_link));
> + list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
> + spin_unlock_irq(&req->engine->fence_lock);
> +
> + /*
> + * NB: Interrupts are only enabled on demand. Thus there is still a
> + * race where the request could complete before the interrupt has
> + * been enabled. Thus care must be taken at that point.
> + */
> +
> + /* Have interrupts already been requested? */
> + if (req->signal_requested)
> + i915_gem_request_enable_interrupt(req, false);
> +}
> +
> +/*
> + * The request is being actively waited on, so enable interrupt based
> + * completion signalling.
> + */
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> + bool fence_locked)
> +{
> + struct drm_i915_private *dev_priv = to_i915(req->engine->dev);
> + const bool irq_test_in_progress =
> + ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
> + intel_engine_flag(req->engine);
> +
> + if (req->irq_enabled)
> + return;
> +
> + if (irq_test_in_progress)
> + return;
> +
> + if (!WARN_ON(!req->engine->irq_get(req->engine)))
> + req->irq_enabled = true;
> +
> + /*
> + * Because the interrupt is only enabled on demand, there is a race
> + * where the interrupt can fire before anyone is looking for it. So
> + * do an explicit check for missed interrupts.
> + */
> + i915_gem_request_notify(req->engine, fence_locked);
> }
>
> -static bool i915_gem_request_is_completed(struct fence *req_fence)
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> {
> struct drm_i915_gem_request *req = container_of(req_fence,
> typeof(*req), fence);
> +
> + /*
> + * No need to actually enable interrupt based processing until the
> + * request has been submitted to the hardware. At which point
> + * 'i915_gem_request_submit()' is called. So only really enable
> + * signalling in there. Just set a flag to say that interrupts are
> + * wanted when the request is eventually submitted. On the other hand
> + * if the request has already been submitted then interrupts do need
> + * to be enabled now.
> + */
> +
> + req->signal_requested = true;
> +
> + if (!list_empty(&req->signal_link))
> + i915_gem_request_enable_interrupt(req, true);
> +
> + return true;
> +}
> +
> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
> +{
> + struct drm_i915_gem_request *req, *req_next;
> + unsigned long flags;
> u32 seqno;
>
> - seqno = req->engine->get_seqno(req->engine);
> + if (list_empty(&engine->fence_signal_list))
> + return;
> +
If you don't hold the lock then you should probably use list_empty_careful.
~Maarten
More information about the Intel-gfx
mailing list