[Intel-gfx] ✗ Fi.CI.BAT: failure for Stop users from using the device on driver unbind

Janusz Krzysztofik janusz.krzysztofik at linux.intel.com
Mon Apr 8 08:19:11 UTC 2019


On Friday, April 5, 2019 7:37:04 PM CEST Chris Wilson wrote:
> Quoting Chris Wilson (2019-04-05 17:26:46)
> 
> > Quoting Patchwork (2019-04-05 17:20:39)
> > 
> > > == Series Details ==
> > > 
> > > Series: Stop users from using the device on driver unbind
> > > URL   : https://patchwork.freedesktop.org/series/59064/
> > > State : failure
> > > 
> > > == Summary ==
> > > 
> > > CI Bug Log - changes from CI_DRM_5881 -> Patchwork_12699
> > > ====================================================
> > > 
> > > Summary
> > > -------
> > > 
> > >   **FAILURE**
> > >   
> > >   Serious unknown changes coming with Patchwork_12699 absolutely need to
> > >   be
> > >   verified manually.
> > >   
> > >   If you think the reported changes have nothing to do with the changes
> > >   introduced in Patchwork_12699, please notify your bug team to allow
> > >   them
> > >   to document this new failure mode, which will reduce false positives
> > >   in CI.
> > >   
> > >   External URL:
> > >   https://patchwork.freedesktop.org/api/1.0/series/59064/revisions/1/mb
> > >   ox/> > 
> > > Possible new issues
> > > -------------------
> > > 
> > >   Here are the unknown changes that may have been introduced in 
Patchwork_12699:
> > > ### IGT changes ###
> > > 
> > > #### Possible regressions ####
> > > 
> > >   * igt at i915_module_load@reload:
> > 2 issues, it appears:
> > 
> > <4> [271.799080] WARN_ON(dev_priv->mm.object_count)
> > <4> [271.799241] WARNING: CPU: 0 PID: 3288 at
> > drivers/gpu/drm/i915/i915_gem.c:5145 i915_gem_cleanup_early+0x104/0x110
> > [i915] <4> [271.799249] Modules linked in: vgem snd_hda_codec_hdmi
> > snd_hda_codec_realtek snd_hda_codec_generic i915(-) mei_hdcp
> > x86_pkg_temp_thermal btusb coretemp btrtl btbcm btintel bluetooth
> > crct10dif_pclmul crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel
> > snd_hda_core e1000e ecdh_generic snd_pcm mei_me ptp prime_numbers
> > pps_core mei [last unloaded: snd_hda_intel] <4> [271.799302] CPU: 0 PID:
> > 3288 Comm: i915_module_loa Tainted: G     U           
> > 5.1.0-rc3-CI-Patchwork_12699+ #1 <4> [271.799307] Hardware name: 
> > /NUC6i7KYB, BIOS KYSKLi70.86A.0059.2018.1122.1431 11/22/2018 <4>
> > [271.799406] RIP: 0010:i915_gem_cleanup_early+0x104/0x110 [i915] <4>
> > [271.799412] Code: 00 00 48 c7 c2 d0 6b 3d a0 48 c7 c7 ca 5c 2c a0 e8 c1
> > b5 ec e0 0f 0b 48 c7 c6 68 c0 3f a0 48 c7 c7 63 88 42 a0 e8 9c 77 de e0
> > <0f> 0b e9 40 ff ff ff 0f 1f 44 00 00 e8 5b 7e 00 00 31 c0 c3 0f 1f <4>
> > [271.799417] RSP: 0018:ffffc90000453dd0 EFLAGS: 00010282
> > <4> [271.799423] RAX: 0000000000000000 RBX: ffff88849afd0000 RCX:
> > 0000000000000000 <4> [271.799428] RDX: 0000000000000006 RSI:
> > ffff88849ee130b8 RDI: ffffffff8211dc4d <4> [271.799432] RBP:
> > ffff88849afd7630 R08: 00000000028bc995 R09: 0000000000000000 <4>
> > [271.799436] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffffa04a81e0 <4> [271.799440] R13: 0000000000000000 R14:
> > 0000000000000000 R15: ffffffffa04a82d0 <4> [271.799446] FS: 
> > 00007f31e8cec980(0000) GS:ffff8884aee00000(0000) knlGS:0000000000000000
> > <4> [271.799451] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>
> > [271.799455] CR2: 00007ffea58773d8 CR3: 000000044cfc6003 CR4:
> > 00000000003606f0 <4> [271.799459] Call Trace:
> > <4> [271.799531]  i915_driver_cleanup_early+0x30/0x70 [i915]
> > <4> [271.799603]  i915_driver_release+0xa/0x30 [i915]
> > <4> [271.799672]  i915_driver_unload+0x6a/0x120 [i915]
> > <4> [271.799748]  i915_pci_remove+0x19/0x30 [i915]
> > <4> [271.799765]  pci_device_remove+0x36/0xb0
> 
> So this is the bizarre part. We end up in the final i915_driver_release
> because it appears that drm_dev_unplug() drops a reference. I couldn't
> see where...
> 
> [   24.960676] WARNING: CPU: 2 PID: 637 at drivers/gpu/drm/drm_drv.c:895
> drm_dev_put+0x8/0x60 [   24.960735] Modules linked in: nls_ascii nls_cp437
> vfat fat crct10dif_pclmul crc32_pclmul crc32c_intel i915(-) aesni_intel
> aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_uncore
> intel_rapl_perf efivars i2c_i801 intel_gtt drm_kms_helper ahci libahci
> video button efivarfs [   24.960848] CPU: 2 PID: 637 Comm: i915_module_loa
> Tainted: G    BU            5.1.0-rc3+ #526 [   24.960897] Hardware name:
> Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS
> BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [   24.960952] RIP:
> 0010:drm_dev_put+0x8/0x60
> [   24.960993] Code: 48 8d 7b 60 e8 d9 8b c7 ff 48 8b 7b 60 5b 5d e9 0e 4f
> c7 ff 48 89 df e8 06 c2 ff ff e9 3f ff ff ff 90 48 85 ff 75 01 c3 55 53
> <0f> 0b f0 ff 4f 14 0f 88 64 b7 2d 00 74 03 5b 5d c3 48 89 fb 48 8d [  
> 24.961066] RSP: 0018:ffff88872587fc80 EFLAGS: 00010286
> [   24.961107] RAX: 0000000000000000 RBX: ffff88873f020000 RCX:
> ffffffff81680444 [   24.961151] RDX: dffffc0000000000 RSI: dffffc0000000000
> RDI: ffff88873f020000 [   24.961195] RBP: ffff88873f02ad88 R08:
> 0000000000000000 R09: fffffbfff04824c5 [   24.961240] R10: fffffbfff04824c5
> R11: ffffffff8241262b R12: ffff88881ab067a0 [   24.961284] R13:
> ffffffffa0618c00 R14: ffff88881ab06660 R15: ffff88881ab06960 [   24.961330]
> FS:  00007fdba43279c0(0000) GS:ffff88881f500000(0000)
> knlGS:0000000000000000 [   24.961377] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> [   24.961618] CR2: 00007fffc0229f80 CR3: 0000000726506001 CR4:
> 00000000001606e0 [   24.961662] Call Trace:
> [   24.961773]  i915_driver_unload+0x72/0x130 [i915]
> [   24.961888]  i915_pci_remove+0x2a/0x50 [i915]
> [   24.961929]  pci_device_remove+0xaa/0x180
> [   24.961968]  ? pcibios_free_irq+0x10/0x10
> [   24.962005]  ? up_read+0xc2/0xe0
> [   24.962041]  device_release_driver_internal+0x12b/0x260
> [   24.962081]  driver_detach+0x6f/0xca
> [   24.962117]  bus_remove_driver+0xc4/0x141
> [   24.962157]  pci_unregister_driver+0x32/0xf0
> [   24.962274]  i915_exit+0x16/0x1c [i915]
> [   24.962312]  __x64_sys_delete_module+0x20e/0x2b0
> [   24.962351]  ? __ia32_sys_delete_module+0x2b0/0x2b0
> [   24.962390]  ? lockdep_hardirqs_on+0x11/0x250
> [   24.962428]  ? lockdep_hardirqs_off+0x1a/0x100
> [   24.962465]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [   24.962504]  do_syscall_64+0x5d/0x200
> [   24.962542]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   24.962581] RIP: 0033:0x7fdba6189137
> 
> which gdb insists is the drm_dev_unplug() call. Oh drm-tip, not dinq.
> Noralf has been playing.
> 
> We have
> 
> commit ba3bf37e150a99b51b13f5cebf89715685d21212
> Author: Noralf Trønnes <noralf at tronnes.org>
> Date:   Fri Feb 8 15:01:03 2019 +0100
> 
>     drm/drv: drm_dev_unplug(): Move out drm_dev_put() call
> 
>     This makes it possible to use drm_dev_unplug() with the upcoming
>     devm_drm_dev_init() which will do drm_dev_put() in its release callback.
> 
> but drm-tip has a mismash of trees and a conflict that brings the
> drm_dev_put() here right back in.

Yeah, I can see the "drm: Fix drm_release() and device unplug" patch,
which is patch 1/2 of the series that introduced commit ba3bf37e150a
("drm/drv: drm_dev_unplug(): Move out drm_dev_put() call"), is applied
twice in drm-tip, one instance coming from drm-fixes as
commit 3f04e0a6cfeb, the other from drm-next as 1ee57d4d75fb.

(there were two).  Playing with it, I learned by excercise it was
certainly too early for us to drop the reference at that point.  I took
the opportunity that drm_dev_put() was finally moved out of
drm_dev_unplug() so we could use it in place of drm_dev_unregister().  

Janusz

> -Chris





More information about the Intel-gfx mailing list