[Intel-gfx] [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown
Chris Wilson
chris at chris-wilson.co.uk
Fri Jul 13 17:52:09 UTC 2018
Quoting Michal Wajdeczko (2018-07-13 18:48:05)
> On Fri, 13 Jul 2018 19:26:58 +0200, Chris Wilson
> <chris at chris-wilson.co.uk> wrote:
>
> > Hopefully the final hack to get guc fault-injection happy before we can
> > clean it up again, starting from a known good baseline...
> >
> > [ 383.017530] BUG: unable to handle kernel NULL pointer dereference at
> > 00000000000000a0
> > [ 383.017556] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 383.017566] CPU: 7 PID: 4725 Comm: drv_module_relo Tainted: G
> > U 4.18.0-rc4-CI-CI_DRM_4485+ #1
> > [ 383.017581] Hardware name: Micro-Star International Co., Ltd.
> > MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
> > [ 383.017664] RIP: 0010:guc_stage_desc_pool_destroy+0x17/0xe0 [i915]
> > [ 383.017674] Code: 59 a0 c6 05 02 59 18 00 01 e8 5e 01 c3 e0 eb b1 0f
> > 1f 00 53 48 89 fb 48 81 c7 90 02 00 00 e8 60 64 45 e1 48 8b 83 80 02 00
> > 00 <48> 8b 80 a0 00 00 00 48 8b 90 68 02 00 00 48 83 ea 01 48 81 fa ff
> > [ 383.017771] RSP: 0018:ffffc900004bbdd0 EFLAGS: 00010282
> > [ 383.017782] RAX: 0000000000000000 RBX: ffff88012ff41300 RCX:
> > 0000000000000000
> > [ 383.017794] RDX: 0000000000000000 RSI: ffffc900004bbd80 RDI:
> > 0000000000000000
> > [ 383.017805] RBP: ffff88012ff40000 R08: 00000000d876ee11 R09:
> > 0000000000000000
> > [ 383.017817] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffff88012ff47770
> > [ 383.017828] R13: ffff88012ff40068 R14: ffff880264392ef8 R15:
> > ffffffffa0639950
> > [ 383.017840] FS: 00007fb9c18c8980(0000) GS:ffff8802663c0000(0000)
> > knlGS:0000000000000000
> > [ 383.017853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 383.017864] CR2: 00000000000000a0 CR3: 00000001df6cc003 CR4:
> > 00000000003606e0
> > [ 383.017875] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 383.017887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [ 383.017898] Call Trace:
> > [ 383.017962] intel_uc_fini+0x34/0xd0 [i915]
> > [ 383.018020] i915_gem_fini+0x5c/0x100 [i915]
> > [ 383.018093] i915_driver_unload+0xd2/0x110 [i915]
> > [ 383.018150] i915_pci_remove+0x10/0x20 [i915]
> > [ 383.018165] pci_device_remove+0x36/0xb0
> > [ 383.018179] device_release_driver_internal+0x185/0x250
> > [ 383.018193] driver_detach+0x35/0x70
> > [ 383.018205] bus_remove_driver+0x53/0xd0
> > [ 383.018217] pci_unregister_driver+0x25/0xa0
> > [ 383.018232] __se_sys_delete_module+0x162/0x210
> > [ 383.018245] ? do_syscall_64+0xd/0x190
> > [ 383.018257] do_syscall_64+0x55/0x190
> > [ 383.018270] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [ 383.018282] RIP: 0033:0x7fb9c0f7c1b7
> > [ 383.018290] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83
> > c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f
> > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> > [ 383.018408] RSP: 002b:00007fffa01c2aa8 EFLAGS: 00000206 ORIG_RAX:
> > 00000000000000b0
> > [ 383.018425] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> > 00007fb9c0f7c1b7
> > [ 383.018440] RDX: 0000000000000000 RSI: 0000000000000800 RDI:
> > 0000560b96856d48
> > [ 383.018454] RBP: 0000560b96856ce0 R08: 0000560b96856d4c R09:
> > 00007fffa01c2ae8
> > [ 383.018468] R10: 00007fffa01c1aa4 R11: 0000000000000206 R12:
> > 0000560b954f7470
> >
> > Testcase: igt/drv_module_reload/basic-reload-inject
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: MichaĆ Winiarski <michal.winiarski at intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c
> > b/drivers/gpu/drm/i915/intel_guc_submission.c
> > index 22367131d6a1..cc444dc5f3ad 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> > @@ -1184,7 +1184,8 @@ void intel_guc_submission_fini(struct intel_guc
> > *guc)
> > guc_clients_destroy(guc);
> > WARN_ON(!guc_verify_doorbells(guc));
> > - guc_stage_desc_pool_destroy(guc);
> > + if (guc->stage_desc_pool)
> > + guc_stage_desc_pool_destroy(guc);
>
> As short-term hack this is probably ok, but maybe to avoid such case by
> case hacks we should add single flag at UC level (intel_uc_init) that we
> have completed our initialization and then use this flag at cleanup phase
> (intel_uc_fini) just once.
>
> Michal
>
> ps.
> I recall some earlier reviews saying that using "if" at fini is wrong ;)
It is so wrong, and I'm cutting every corner in order to get the test in
place for someone else to clean up. Judging by the number of bugs that
have swept by with the non-functional drv_module_reload fault-injection,
we need to get it back working.
-Chris
More information about the Intel-gfx
mailing list