[Intel-gfx] [PATCH v2] drm/i915: Protect guc_fini_wq() against module load abort

Michal Wajdeczko michal.wajdeczko at intel.com
Wed Jul 25 16:17:10 UTC 2018


On Tue, 24 Jul 2018 16:19:36 +0200, Chris Wilson  
<chris at chris-wilson.co.uk> wrote:

> Prevent
> [  397.873143] general protection fault: 0000 [#1] PREEMPT SMP PTI
> [  397.873154] CPU: 4 PID: 4799 Comm: drv_module_relo Tainted: G      
> U            4.18.0-rc6-CI-CI_DRM_4534+ #1
> [  397.873162] Hardware name: Micro-Star International Co., Ltd.  
> MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
> [  397.873175] RIP: 0010:__lock_acquire+0xf6/0x1b50
> [  397.873179] Code: 85 c0 4c 8b 9d 40 ff ff ff 8b 8d 38 ff ff ff 44 8b  
> 8d 30 ff ff ff 4c 8b 85 28 ff ff ff 44 8b 95 24 ff ff ff 0f 84 54 03 00  
> 00 <f0> ff 80 38 01 00 00 8b 15 45 8c 59 02 45 8b bc 24 70 08 00 00 85
> [  397.873240] RSP: 0018:ffffc90000497b40 EFLAGS: 00010002
> [  397.873246] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000001 RCX:  
> 0000000000000000
> [  397.873252] RDX: 0000000000000046 RSI: 0000000000000000 RDI:  
> 0000000000000000
> [  397.873258] RBP: ffffc90000497c20 R08: ffffffff810a25e9 R09:  
> 0000000000000000
> [  397.873264] R10: 0000000000000000 R11: ffff880255c63c28 R12:  
> ffff8801093b2840
> [  397.873270] R13: 0000000000000001 R14: 0000000000000001 R15:  
> 0000000000000246
> [  397.873277] FS:  00007faf88d71980(0000) GS:ffff880266300000(0000)  
> knlGS:0000000000000000
> [  397.873284] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  397.873289] CR2: 000055d866c9ca10 CR3: 000000025472e006 CR4:  
> 00000000003606e0
> [  397.873295] DR0: 0000000000000000 DR1: 0000000000000000 DR2:  
> 0000000000000000
> [  397.873301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:  
> 0000000000000400
> [  397.873308] Call Trace:
> [  397.873318]  ? lock_acquire+0xa6/0x210
> [  397.873323]  lock_acquire+0xa6/0x210
> [  397.873331]  ? drain_workqueue+0x19/0x180
> [  397.873339]  __mutex_lock+0x89/0x980
> [  397.873346]  ? drain_workqueue+0x19/0x180
> [  397.873352]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
> [  397.873359]  ? trace_hardirqs_on_caller+0xe0/0x1b0
> [  397.873365]  ? drain_workqueue+0x19/0x180
> [  397.873373]  ? debug_object_active_state+0x127/0x150
> [  397.873381]  ? drain_workqueue+0x19/0x180
> [  397.873387]  drain_workqueue+0x19/0x180
> [  397.873395]  destroy_workqueue+0x12/0x1f0
> [  397.873476]  intel_guc_fini_misc+0x36/0x90 [i915]
> [  397.873540]  i915_gem_fini+0x91/0x100 [i915]
> [  397.873588]  i915_driver_unload+0xd2/0x110 [i915]
> [  397.873638]  i915_pci_remove+0x19/0x30 [i915]
> [  397.873646]  pci_device_remove+0x36/0xb0
> [  397.873653]  device_release_driver_internal+0x185/0x250
> [  397.873660]  driver_detach+0x35/0x70
> [  397.873668]  bus_remove_driver+0x53/0xd0
> [  397.873675]  pci_unregister_driver+0x25/0xa0
> [  397.873683]  __se_sys_delete_module+0x162/0x210
> [  397.873691]  ? do_syscall_64+0xd/0x190
> [  397.873697]  do_syscall_64+0x55/0x190
> [  397.873704]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  397.873710] RIP: 0033:0x7faf884231b7
> [  397.873714] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83  
> c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f  
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
> [  397.873775] RSP: 002b:00007ffda4e98cf8 EFLAGS: 00000206 ORIG_RAX:  
> 00000000000000b0
> [  397.873784] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:  
> 00007faf884231b7
> [  397.873790] RDX: 0000000000000000 RSI: 0000000000000800 RDI:  
> 000055fbb18f1bd8
> [  397.873796] RBP: 000055fbb18f1b70 R08: 000055fbb18f1bdc R09:  
> 00007ffda4e98d38
> [  397.873802] R10: 00007ffda4e97cf4 R11: 0000000000000206 R12:  
> 000055fbb0d32470
> [  397.873808] R13: 00007ffda4e992e0 R14: 0000000000000000 R15:  
> 0000000000000000
>
> v2: It's use-after-free; not a NULL pointer.
>
> Testcase: igt/drv_module_reload/basic-reload-inject
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski at intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> ---
>  drivers/gpu/drm/i915/intel_guc.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_guc.c  
> b/drivers/gpu/drm/i915/intel_guc.c
> index 846d693ecb53..3082d7670f05 100644
> --- a/drivers/gpu/drm/i915/intel_guc.c
> +++ b/drivers/gpu/drm/i915/intel_guc.c
> @@ -128,13 +128,15 @@ static int guc_init_wq(struct intel_guc *guc)
> static void guc_fini_wq(struct intel_guc *guc)
>  {
> -	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> +	struct workqueue_struct *wq;
> -	if (HAS_LOGICAL_RING_PREEMPTION(dev_priv) &&
> -	    USES_GUC_SUBMISSION(dev_priv))
> -		destroy_workqueue(guc->preempt_wq);
> +	wq = fetch_and_zero(&guc->preempt_wq);
> +	if (wq)
> +		destroy_workqueue(wq);
> -	destroy_workqueue(guc->log.relay.flush_wq);
> +	wq = fetch_and_zero(&guc->log.relay.flush_wq);
> +	if (wq)
> +		destroy_workqueue(wq);
>  }
> int intel_guc_init_misc(struct intel_guc *guc)

instead of adding (already undesired) "if"s in fini functions,
I would rather consider fixing i915_gem_init() as in [1]

diff --git a/drivers/gpu/drm/i915/i915_gem.c  
b/drivers/gpu/drm/i915/i915_gem.c
index a4031fa..7da9860 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5592,10 +5592,10 @@ int i915_gem_init(struct drm_i915_private  
*dev_priv)
         mutex_unlock(&dev_priv->drm.struct_mutex);

  err_uc_misc:
-       intel_uc_fini_misc(dev_priv);
-
-       if (ret != -EIO)
+       if (ret != -EIO) {
+               intel_uc_fini_misc(dev_priv);
                 i915_gem_cleanup_userptr(dev_priv);
+       }

         if (ret == -EIO) {
                 /*

Thanks,
Michal

[1] https://patchwork.freedesktop.org/patch/205722/


More information about the Intel-gfx mailing list