[Intel-gfx] [PATCH] drm/i915/guc: Protect against no desc-pool on premature shutdown

Chris Wilson chris at chris-wilson.co.uk
Fri Jul 13 18:24:39 UTC 2018


Quoting Michal Wajdeczko (2018-07-13 18:57:33)
> On Fri, 13 Jul 2018 19:52:09 +0200, Chris Wilson  
> <chris at chris-wilson.co.uk> wrote:
> 
> > Quoting Michal Wajdeczko (2018-07-13 18:48:05)
> >> On Fri, 13 Jul 2018 19:26:58 +0200, Chris Wilson
> >> <chris at chris-wilson.co.uk> wrote:
> >>
> >> > Hopefully the final hack to get guc fault-injection happy before we  
> >> can
> >> > clean it up again, starting from a known good baseline...
> >> >
> >> > [  383.017530] BUG: unable to handle kernel NULL pointer dereference  
> >> at
> >> > 00000000000000a0
> >> > [  383.017556] Oops: 0000 [#1] PREEMPT SMP PTI
> >> > [  383.017566] CPU: 7 PID: 4725 Comm: drv_module_relo Tainted: G
> >> > U            4.18.0-rc4-CI-CI_DRM_4485+ #1
> >> > [  383.017581] Hardware name: Micro-Star International Co., Ltd.
> >> > MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
> >> > [  383.017664] RIP: 0010:guc_stage_desc_pool_destroy+0x17/0xe0 [i915]
> >> > [  383.017674] Code: 59 a0 c6 05 02 59 18 00 01 e8 5e 01 c3 e0 eb b1  
> >> 0f
> >> > 1f 00 53 48 89 fb 48 81 c7 90 02 00 00 e8 60 64 45 e1 48 8b 83 80 02  
> >> 00
> >> > 00 <48> 8b 80 a0 00 00 00 48 8b 90 68 02 00 00 48 83 ea 01 48 81 fa ff
> >> > [  383.017771] RSP: 0018:ffffc900004bbdd0 EFLAGS: 00010282
> >> > [  383.017782] RAX: 0000000000000000 RBX: ffff88012ff41300 RCX:
> >> > 0000000000000000
> >> > [  383.017794] RDX: 0000000000000000 RSI: ffffc900004bbd80 RDI:
> >> > 0000000000000000
> >> > [  383.017805] RBP: ffff88012ff40000 R08: 00000000d876ee11 R09:
> >> > 0000000000000000
> >> > [  383.017817] R10: 0000000000000000 R11: 0000000000000000 R12:
> >> > ffff88012ff47770
> >> > [  383.017828] R13: ffff88012ff40068 R14: ffff880264392ef8 R15:
> >> > ffffffffa0639950
> >> > [  383.017840] FS:  00007fb9c18c8980(0000) GS:ffff8802663c0000(0000)
> >> > knlGS:0000000000000000
> >> > [  383.017853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > [  383.017864] CR2: 00000000000000a0 CR3: 00000001df6cc003 CR4:
> >> > 00000000003606e0
> >> > [  383.017875] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >> > 0000000000000000
> >> > [  383.017887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> >> > 0000000000000400
> >> > [  383.017898] Call Trace:
> >> > [  383.017962]  intel_uc_fini+0x34/0xd0 [i915]
> >> > [  383.018020]  i915_gem_fini+0x5c/0x100 [i915]
> >> > [  383.018093]  i915_driver_unload+0xd2/0x110 [i915]
> >> > [  383.018150]  i915_pci_remove+0x10/0x20 [i915]
> >> > [  383.018165]  pci_device_remove+0x36/0xb0
> >> > [  383.018179]  device_release_driver_internal+0x185/0x250
> >> > [  383.018193]  driver_detach+0x35/0x70
> >> > [  383.018205]  bus_remove_driver+0x53/0xd0
> >> > [  383.018217]  pci_unregister_driver+0x25/0xa0
> >> > [  383.018232]  __se_sys_delete_module+0x162/0x210
> >> > [  383.018245]  ? do_syscall_64+0xd/0x190
> >> > [  383.018257]  do_syscall_64+0x55/0x190
> >> > [  383.018270]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >> > [  383.018282] RIP: 0033:0x7fb9c0f7c1b7
> >> > [  383.018290] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48  
> >> 83
> >> > c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00  
> >> 0f
> >> > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> >> > [  383.018408] RSP: 002b:00007fffa01c2aa8 EFLAGS: 00000206 ORIG_RAX:
> >> > 00000000000000b0
> >> > [  383.018425] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> >> > 00007fb9c0f7c1b7
> >> > [  383.018440] RDX: 0000000000000000 RSI: 0000000000000800 RDI:
> >> > 0000560b96856d48
> >> > [  383.018454] RBP: 0000560b96856ce0 R08: 0000560b96856d4c R09:
> >> > 00007fffa01c2ae8
> >> > [  383.018468] R10: 00007fffa01c1aa4 R11: 0000000000000206 R12:
> >> > 0000560b954f7470
> >> >
> >> > Testcase: igt/drv_module_reload/basic-reload-inject
> >> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >> > Cc: MichaƂ Winiarski <michal.winiarski at intel.com>
> >> > Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> >> > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/intel_guc_submission.c | 3 ++-
> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c
> >> > b/drivers/gpu/drm/i915/intel_guc_submission.c
> >> > index 22367131d6a1..cc444dc5f3ad 100644
> >> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> >> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> >> > @@ -1184,7 +1184,8 @@ void intel_guc_submission_fini(struct intel_guc
> >> > *guc)
> >> >       guc_clients_destroy(guc);
> >> >       WARN_ON(!guc_verify_doorbells(guc));
> >> > -     guc_stage_desc_pool_destroy(guc);
> >> > +     if (guc->stage_desc_pool)
> >> > +             guc_stage_desc_pool_destroy(guc);
> >>
> >> As short-term hack this is probably ok, but maybe to avoid such case by
> >> case hacks we should add single flag at UC level (intel_uc_init) that we
> >> have completed our initialization and then use this flag at cleanup  
> >> phase
> >> (intel_uc_fini) just once.
> >>
> >> Michal
> >>
> >> ps.
> >> I recall some earlier reviews saying that using "if" at fini is wrong ;)
> >
> > It is so wrong, and I'm cutting every corner in order to get the test in
> > place for someone else to clean up. Judging by the number of bugs that
> > have swept by with the non-functional drv_module_reload fault-injection,
> > we need to get it back working.
> 
> ok, I hope 'someone' will find time to finish this cleanup, so

Let's have a game of musical chairs!

Pushed, now I just need to apply an igt patch and then can try again to
see if drv_module_reload is ready.
-Chris


More information about the Intel-gfx mailing list