[Intel-gfx] [PATCH 6/7] drm/i915/gt: Split the breadcrumb spinlock between global and contexts

Chris Wilson chris at chris-wilson.co.uk
Fri Aug 7 08:35:56 UTC 2020


Quoting Chris Wilson (2020-08-07 09:32:55)
> As we funnel more and more contexts into the breadcrumbs on an engine,
> the hold time of b->irq_lock grows. As we may then contend with the
> b->irq_lock during request submission, this increases the burden upon
> the engine->active.lock and so directly impacts both our execution
> latency and client latency. If we split the b->irq_lock by introducing a
> per-context spinlock to manage the signalers within a context, we then
> only need the b->irq_lock for enabling/disabling the interrupt and can
> avoid taking the lock for walking the list of contexts within the signal
> worker. Even with the current setup, this greatly reduces the number of
> times we have to take and fight for b->irq_lock.
> 
> Furthermore, this closes the race between enabling the signaling context
> while it is in the process of being signaled and removed:
> 
> <4>[  416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870).
> <4>[  416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
> <4>[  416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
> <4>[  416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G     U            5.8.0-CI-CI_DRM_8852+ #1
> <4>[  416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
> <4>[  416.208627] RIP: 0010:__list_add_valid+0x4d/0x70
> <4>[  416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8
> <4>[  416.208633] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086
> <4>[  416.208636] RAX: 0000000000000000 RBX: ffff888250a44880 RCX: 0000000000000105
> <4>[  416.208639] RDX: 0000000000000105 RSI: ffffffff82320c5b RDI: 00000000ffffffff
> <4>[  416.208641] RBP: ffff8882781bb870 R08: 0000000000000000 R09: 0000000000000001
> <4>[  416.208643] R10: 00000000054d2957 R11: 000000006abbd991 R12: ffff8881951d58c8
> <4>[  416.208646] R13: ffff888286073880 R14: ffff888286073848 R15: ffff8881951d5910
> <4>[  416.208669] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
> <4>[  416.208671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[  416.208673] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0
> <4>[  416.208675] PKRU: 55555554
> <4>[  416.208677] Call Trace:
> <4>[  416.208679]  <IRQ>
> <4>[  416.208751]  i915_request_enable_breadcrumb+0x278/0x400 [i915]
> <4>[  416.208839]  __i915_request_submit+0xca/0x2a0 [i915]
> <4>[  416.208892]  __execlists_submission_tasklet+0x480/0x1830 [i915]
> <4>[  416.208942]  execlists_submission_tasklet+0xc4/0x130 [i915]
> <4>[  416.208947]  tasklet_action_common.isra.17+0x6c/0x1c0
> <4>[  416.208954]  __do_softirq+0xdf/0x498
> <4>[  416.208960]  ? handle_fasteoi_irq+0x150/0x150
> <4>[  416.208964]  asm_call_on_stack+0xf/0x20
> <4>[  416.208966]  </IRQ>
> <4>[  416.208969]  do_softirq_own_stack+0xa1/0xc0
> <4>[  416.208972]  irq_exit_rcu+0xb5/0xc0
> <4>[  416.208976]  common_interrupt+0xf7/0x260
> <4>[  416.208980]  asm_common_interrupt+0x1e/0x40
> <4>[  416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410
> <4>[  416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48
> <4>[  416.208989] RSP: 0018:ffffc90000143e70 EFLAGS: 00000206
> <4>[  416.208991] RAX: 0000000000000007 RBX: ffffe8ffffda8070 RCX: 0000000000000000
> <4>[  416.208993] RDX: 0000000000000000 RSI: ffffffff8238b4ee RDI: ffffffff8233184f
> <4>[  416.208995] RBP: ffffffff826b4e00 R08: 0000000000000000 R09: 0000000000000000
> <4>[  416.208997] R10: 0000000000000001 R11: 0000000000000000 R12: 00000060e7f24a8f
> <4>[  416.208998] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000003
> <4>[  416.209012]  cpuidle_enter+0x24/0x40
> <4>[  416.209016]  do_idle+0x22f/0x2d0
> <4>[  416.209022]  cpu_startup_entry+0x14/0x20
> <4>[  416.209025]  start_secondary+0x158/0x1a0
> <4>[  416.209030]  secondary_startup_64+0xa4/0xb0
> <4>[  416.209039] irq event stamp: 10186977
> <4>[  416.209042] hardirqs last  enabled at (10186976): [<ffffffff810b9363>] tasklet_action_common.isra.17+0xe3/0x1c0
> <4>[  416.209044] hardirqs last disabled at (10186977): [<ffffffff81a5e5ed>] _raw_spin_lock_irqsave+0xd/0x50
> <4>[  416.209047] softirqs last  enabled at (10186968): [<ffffffff810b9a1a>] irq_enter_rcu+0x6a/0x70
> <4>[  416.209049] softirqs last disabled at (10186969): [<ffffffff81c00f4f>] asm_call_on_stack+0xf/0x20
> 
> <4>[  416.209317] list_del corruption, ffff8882781bb870->next is LIST_POISON1 (dead000000000100)
> <4>[  416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
> <4>[  416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
> <4>[  416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G     U  W         5.8.0-CI-CI_DRM_8852+ #1
> <4>[  416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
> <4>[  416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90
> <4>[  416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b
> <4>[  416.209317] RSP: 0018:ffffc90000280de8 EFLAGS: 00010086
> <4>[  416.209317] RAX: 0000000000000000 RBX: ffff8882781bb848 RCX: 0000000000010104
> <4>[  416.209317] RDX: 0000000000010104 RSI: ffffffff8238b4ee RDI: 00000000ffffffff
> <4>[  416.209317] RBP: ffff8882781bb880 R08: 0000000000000000 R09: 0000000000000001
> <4>[  416.209317] R10: 000000009fb6666e R11: 00000000feca9427 R12: ffffc90000280e18
> <4>[  416.209317] R13: ffff8881951d5930 R14: dead0000000000d8 R15: ffff8882781bb880
> <4>[  416.209317] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
> <4>[  416.209317] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[  416.209317] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0
> <4>[  416.209317] PKRU: 55555554
> <4>[  416.209317] Call Trace:
> <4>[  416.209317]  <IRQ>
> <4>[  416.209317]  remove_signaling_context.isra.13+0xd/0x70 [i915]
> <4>[  416.209513]  signal_irq_work+0x1f7/0x4b0 [i915]

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276
> Fixes: bda4d4db6dd6 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
-Chris


More information about the Intel-gfx mailing list