[Intel-gfx] [RFC] drm/i915: Move execlists irq handler to a bottom half

Wed Mar 23 09:07:35 UTC 2016

On Tue, Mar 22, 2016 at 05:30:04PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Doing a lot of work in the interrupt handler introduces huge
> latencies to the system as a whole.
> 
> Most dramatic effect can be seen by running an all engine
> stress test like igt/gem_exec_nop/all where, when the kernel
> config is lean enough, the whole system can be brought into
> multi-second periods of complete non-interactivty. That can
> look for example like this:
> 
>  NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
>  Modules linked in: [redacted for brevity]
>  CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
>  Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
>  Workqueue: i915 gen6_pm_rps_work [i915]
>  task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
>  RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
>  RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
>  RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
>  RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
>  RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
>  R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
>  R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
>  FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Stack:
>   042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
>   0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
>   0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
>  Call Trace:
>   <IRQ>
>   [<ffffffff8104a716>] irq_exit+0x86/0x90
>   [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
>   [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
>   <EOI>
>   [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
>   [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
>   [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
>   [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
>   [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
>   [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
>   [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
>   [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
>   [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
>   [<ffffffff8105ab29>] process_one_work+0x139/0x350
>   [<ffffffff8105b186>] worker_thread+0x126/0x490
>   [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
>   [<ffffffff8105fa64>] kthread+0xc4/0xe0
>   [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
>   [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
>   [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
> 
> I could not explain, or find a code path, which would explain
> a +20 second lockup, but from some instrumentation it was
> apparent the interrupts off proportion of time was between
> 10-25% under heavy load which is quite bad.
> 
> By moving the GT interrupt handling to a tasklet in a most
> simple way, the problem above disappears completely.
> 
> Also, gem_latency -n 100 shows 25% better throughput and CPU
> usage, and 14% better latencies.
> 
> I did not find any gains or regressions with Synmark2 or
> GLbench under light testing. More benchmarking is certainly
> required.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>

I thought tasklets are considered unpopular nowadays? They still steal cpu
time, just have the benefit of not also disabling hard interrupts. There
should be mitigation though to offload these softinterrupts to threads.
Have you tried to create a threaded interrupt thread just for these pins
instead? A bit of boilerplate, but not much using the genirq stuff iirc.

Anyway just an idea to play with/benchmark on top of this one here.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_irq.c         |  2 +-
>  drivers/gpu/drm/i915/intel_lrc.c        | 19 +++++++++++++------
>  drivers/gpu/drm/i915/intel_lrc.h        |  1 -
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>  4 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 8f3e3309c3ab..e68134347007 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1324,7 +1324,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
>  	if (iir & (GT_RENDER_USER_INTERRUPT << test_shift))
>  		notify_ring(engine);
>  	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift))
> -		intel_lrc_irq_handler(engine);
> +		tasklet_schedule(&engine->irq_tasklet);
>  }
>  
>  static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private *dev_priv,
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 6916991bdceb..283426c02f8b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -538,21 +538,23 @@ get_context_status(struct intel_engine_cs *engine, unsigned int read_pointer,
>  
>  /**
>   * intel_lrc_irq_handler() - handle Context Switch interrupts
> - * @ring: Engine Command Streamer to handle.
> + * @engine: Engine Command Streamer to handle.
>   *
>   * Check the unread Context Status Buffers and manage the submission of new
>   * contexts to the ELSP accordingly.
>   */
> -void intel_lrc_irq_handler(struct intel_engine_cs *engine)
> +void intel_lrc_irq_handler(unsigned long data)
>  {
> +	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
>  	struct drm_i915_private *dev_priv = engine->dev->dev_private;
>  	u32 status_pointer;
>  	unsigned int read_pointer, write_pointer;
>  	u32 csb[GEN8_CSB_ENTRIES][2];
>  	unsigned int csb_read = 0, i;
>  	unsigned int submit_contexts = 0;
> +	unsigned long flags;
>  
> -	spin_lock(&dev_priv->uncore.lock);
> +	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
>  	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
>  
>  	status_pointer = I915_READ_FW(RING_CONTEXT_STATUS_PTR(engine));
> @@ -579,9 +581,9 @@ void intel_lrc_irq_handler(struct intel_engine_cs *engine)
>  				    engine->next_context_status_buffer << 8));
>  
>  	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
> -	spin_unlock(&dev_priv->uncore.lock);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>  
> -	spin_lock(&engine->execlist_lock);
> +	spin_lock_irqsave(&engine->execlist_lock, flags);
>  
>  	for (i = 0; i < csb_read; i++) {
>  		if (unlikely(csb[i][0] & GEN8_CTX_STATUS_PREEMPTED)) {
> @@ -604,7 +606,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs *engine)
>  			execlists_context_unqueue(engine);
>  	}
>  
> -	spin_unlock(&engine->execlist_lock);
> +	spin_unlock_irqrestore(&engine->execlist_lock, flags);
>  
>  	if (unlikely(submit_contexts > 2))
>  		DRM_ERROR("More than two context complete events?\n");
> @@ -2020,6 +2022,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>  	if (!intel_engine_initialized(engine))
>  		return;
>  
> +	tasklet_kill(&engine->irq_tasklet);
> +
>  	dev_priv = engine->dev->dev_private;
>  
>  	if (engine->buffer) {
> @@ -2093,6 +2097,9 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *engine)
>  	INIT_LIST_HEAD(&engine->execlist_retired_req_list);
>  	spin_lock_init(&engine->execlist_lock);
>  
> +	tasklet_init(&engine->irq_tasklet, intel_lrc_irq_handler,
> +		     (unsigned long)engine);
> +
>  	logical_ring_init_platform_invariants(engine);
>  
>  	ret = i915_cmd_parser_init_ring(engine);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index a17cb12221ba..0b0853eee91e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -118,7 +118,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
>  			       struct drm_i915_gem_execbuffer2 *args,
>  			       struct list_head *vmas);
>  
> -void intel_lrc_irq_handler(struct intel_engine_cs *engine);
>  void intel_execlists_retire_requests(struct intel_engine_cs *engine);
>  
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 221a94627aab..29810cba8a8c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -266,6 +266,7 @@ struct  intel_engine_cs {
>  	} semaphore;
>  
>  	/* Execlists */
> +	struct tasklet_struct irq_tasklet;
>  	spinlock_t execlist_lock;
>  	struct list_head execlist_queue;
>  	struct list_head execlist_retired_req_list;
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch