[Intel-gfx] [PATCH 1/4] drm/i915: Verify the engine after acquiring the active.lock
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Sep 19 10:08:38 UTC 2019
On 18/09/2019 15:54, Chris Wilson wrote:
> When using virtual engines, the rq->engine is not stable until we hold
> the engine->active.lock (as the virtual engine may be exchanged with the
> sibling). Since commit 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> we may retire a request concurrently with resubmitting it to HW, we need
> to be extra careful to verify we are holding the correct lock for the
> request's active list. This is similar to the issue we saw with
> rescheduling the virtual requests, see sched_lock_engine().
>
> Or else:
>
> <4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610).
> <4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
> <4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
> <4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G U 5.3.0-CI-CI_DRM_6898+ #1
> <4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
> <4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70
> <4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8
> <4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082
> <4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104
> <4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff
> <4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001
> <4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10
> <4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8
> <4> [876.736168] FS: 0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
> <4> [876.736169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
> <4> [876.736171] PKRU: 55555554
> <4> [876.736172] Call Trace:
> <4> [876.736226] __i915_request_submit+0x152/0x370 [i915]
> <4> [876.736263] __execlists_submission_tasklet+0x6da/0x1f50 [i915]
> <4> [876.736293] ? execlists_submission_tasklet+0x29/0x50 [i915]
> <4> [876.736321] execlists_submission_tasklet+0x34/0x50 [i915]
> <4> [876.736325] tasklet_action_common.isra.5+0x47/0xb0
> <4> [876.736328] __do_softirq+0xd8/0x4ae
> <4> [876.736332] ? smpboot_thread_fn+0x23/0x280
> <4> [876.736334] ? smpboot_thread_fn+0x6b/0x280
> <4> [876.736336] run_ksoftirqd+0x2b/0x50
> <4> [876.736338] smpboot_thread_fn+0x1d3/0x280
> <4> [876.736341] ? sort_range+0x20/0x20
> <4> [876.736343] kthread+0x119/0x130
> <4> [876.736345] ? kthread_park+0xa0/0xa0
> <4> [876.736347] ret_from_fork+0x24/0x50
> <4> [876.736353] irq event stamp: 2290145
> <4> [876.736356] hardirqs last enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500
> <4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50
> <4> [876.736360] softirqs last enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae
> <4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50
> <4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
> <4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]---
> <4> [876.736406] ------------[ cut here ]------------
> <4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730
> <4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
> <4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
> <4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G U W 5.3.0-CI-CI_DRM_6898+ #1
> <4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
> <4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90
> <4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3
> <4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086
> <4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002
> <4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff
> <4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001
> <4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8
> <4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001
> <4> [876.736447] FS: 00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
> <4> [876.736448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
> <4> [876.736450] PKRU: 55555554
> <4> [876.736451] Call Trace:
> <4> [876.736488] i915_request_retire+0x224/0x8e0 [i915]
> <4> [876.736521] i915_request_create+0x4b/0x1b0 [i915]
> <4> [876.736550] nop_virtual_engine+0x230/0x4d0 [i915]
>
> Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Matthew Auld <matthew.auld at intel.com>
> ---
> drivers/gpu/drm/i915/i915_request.c | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 3ecf92aa5fc1..606acde47fa0 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -195,6 +195,27 @@ static void free_capture_list(struct i915_request *request)
> }
> }
>
> +static void remove_from_engine(struct i915_request *rq)
> +{
> + struct intel_engine_cs *engine, *locked;
> +
> + /*
> + * Virtual engines complicate acquiring the engine timeline lock,
> + * as their rq->engine pointer is not stable until under that
> + * engine lock. The simple ploy we use is to take the lock then
> + * check that the rq still belongs to the newly locked engine.
> + */
> + locked = READ_ONCE(rq->engine);
> + spin_lock(&locked->active.lock);
> + while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> + spin_unlock(&locked->active.lock);
> + spin_lock(&engine->active.lock);
> + locked = engine;
> + }
> + list_del(&rq->sched.link);
> + spin_unlock(&locked->active.lock);
> +}
> +
> static bool i915_request_retire(struct i915_request *rq)
> {
> struct i915_active_request *active, *next;
> @@ -260,9 +281,7 @@ static bool i915_request_retire(struct i915_request *rq)
> * request that we have removed from the HW and put back on a run
> * queue.
> */
> - spin_lock(&rq->engine->active.lock);
> - list_del(&rq->sched.link);
> - spin_unlock(&rq->engine->active.lock);
> + remove_from_engine(rq);
>
> spin_lock(&rq->lock);
> i915_request_mark_complete(rq);
>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list