[PATCH 0/6] Fix crash after reloading a driver using ttm
Karol Herbst
kherbst at redhat.com
Tue Apr 16 10:54:27 UTC 2019
On Tue, Apr 16, 2019 at 11:12 AM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
>
> Am 16.04.19 um 11:10 schrieb Karol Herbst:
> > On Tue, Apr 16, 2019 at 8:38 AM Christian König
> > <ckoenig.leichtzumerken at gmail.com> wrote:
> >> Am 16.04.19 um 02:35 schrieb Karol Herbst:
> >>> Kobjects are supposed to be dynamically allocated, but with recent changes
> >>> this rule was violated. Reverting those commits fixes crashes when a drm
> >>> driver using TTM gets loaded again.
> >>>
> >>> The object in question is "ttm_mem_glob" declared inside
> >>> "include/drm/ttm/ttm_memory.h" and instatiated inside
> >>> "drivers/gpu/drm/ttm/ttm_memory.c".
> >>>
> >>> from "Documentation/kobject.txt":
> >>> "Because kobjects are dynamic, they must not be declared statically or on
> >>> the stack, but instead, always allocated dynamically. Future versions of
> >>> the kernel will contain a run-time check for kobjects that are created
> >>> statically and will warn the developer of this improper usage."
> >>>
> >>> Unloading ttm before reloading the driver workarounds that crash, because
> >>> the memory backing the kobject member "kobj" is cleaned up. The kobject_del
> >>> and kobject_put function never free or clean up the kobject object leaving
> >>> it in an undefined state.
> >>>
> >>> I reverted a few more commits to make it less painful for me to rever this
> >>> rather big change.
> >> Well, NAK. By reverting those change you also re-introduced the problems
> >> we originally fixed with those patches.
> >>
> >> Please work on a proper fix instead,
> >> Christian.
> > And which problem was that besides duplicated code (or maybe even a
> > bit of memory consumption if multiple ttm driver were used)? If I had
> > to choose between duplicated code and a crash, I choose the former.
> >
> > Maybe I missed the real reason why those changes are made, but the
> > commit messages don't really seem to tell me.
>
> The old implementation crashed because different drivers tried to
> allocate the same kobj.
>
> Crashing in one way is not better than crashing in another way.
>
> Christian.
>
how can that old crash be triggered? By loading two ttm based drivers
at the same time or by other means?
> >
> >>> dmesg output:
> >>> [54758.418036] kobject (00000000687a067d): tried to init an initialized object, something is seriously wrong.
> >>> [54758.418040] CPU: 6 PID: 26746 Comm: insmod Tainted: G U OE 5.0.6-200.fc29.x86_64 #1
> >>> [54758.418041] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.12.1 10/02/2018
> >>> [54758.418041] Call Trace:
> >>> [54758.418049] dump_stack+0x5c/0x80
> >>> [54758.418054] kobject_init.cold.9+0x31/0x3f
> >>> [54758.418057] kobject_init_and_add+0x35/0xa0
> >>> [54758.418063] ttm_mem_global_init+0x8f/0x2b0 [ttm]
> >>> [54758.418067] ? __debugfs_create_file+0xe1/0x110
> >>> [54758.418071] ttm_bo_device_init+0x198/0x2a0 [ttm]
> >>> [54758.418144] nouveau_ttm_init+0xbf/0x340 [nouveau]
> >>> [54758.418206] nouveau_drm_device_init+0x125/0x7d0 [nouveau]
> >>> [54758.418210] ? pci_bus_read_config_word+0x49/0x70
> >>> [54758.418266] nouveau_drm_probe+0x26f/0x2c0 [nouveau]
> >>> [54758.418270] local_pci_probe+0x41/0x90
> >>> [54758.418272] pci_device_probe+0x118/0x1a0
> >>> [54758.418275] really_probe+0xf8/0x3b0
> >>> [54758.418277] driver_probe_device+0xb3/0xf0
> >>> [54758.418278] __driver_attach+0xdd/0x110
> >>> [54758.418280] ? driver_probe_device+0xf0/0xf0
> >>> [54758.418282] bus_for_each_dev+0x77/0xc0
> >>> [54758.418285] ? klist_add_tail+0x3b/0x60
> >>> [54758.418287] bus_add_driver+0x152/0x230
> >>> [54758.418288] ? 0xffffffffc1027000
> >>> [54758.418290] driver_register+0x6b/0xb0
> >>> [54758.418291] ? 0xffffffffc1027000
> >>> [54758.418294] do_one_initcall+0x46/0x1c3
> >>> [54758.418296] ? _cond_resched+0x15/0x30
> >>> [54758.418299] ? kmem_cache_alloc_trace+0x154/0x1d0
> >>> [54758.418302] do_init_module+0x5a/0x210
> >>> [54758.418304] load_module+0x2096/0x22d0
> >>> [54758.418308] ? ima_post_read_file+0xf4/0x100
> >>> [54758.418310] ? __do_sys_finit_module+0xa8/0x110
> >>> [54758.418312] __do_sys_finit_module+0xa8/0x110
> >>> [54758.418315] do_syscall_64+0x5b/0x160
> >>> [54758.418317] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [54758.418319] RIP: 0033:0x7fc0b38b6edd
> >>> [54758.418321] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7b 7f 0c 00 f7 d8 64 89 01 48
> >>> [54758.418322] RSP: 002b:00007ffc7f1620d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> >>> [54758.418323] RAX: ffffffffffffffda RBX: 00005629a3996ac0 RCX: 00007fc0b38b6edd
> >>> [54758.418324] RDX: 0000000000000000 RSI: 00005629a3996260 RDI: 0000000000000003
> >>> [54758.418325] RBP: 00005629a3996260 R08: 0000000000000000 R09: 0000000000000000
> >>> [54758.418326] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
> >>> [54758.418326] R13: 00005629a3996a80 R14: 0000000000000000 R15: 00005629a3996260
> >>> [54758.418346] BUG: unable to handle kernel paging request at 00000004ee194700
> >>> [54758.418348] #PF error: [WRITE]
> >>> [54758.418349] PGD 0 P4D 0
> >>> [54758.418352] Oops: 0002 [#1] SMP PTI
> >>> [54758.418354] CPU: 6 PID: 26746 Comm: insmod Tainted: G U OE 5.0.6-200.fc29.x86_64 #1
> >>> [54758.418355] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.12.1 10/02/2018
> >>> [54758.418360] RIP: 0010:ttm_mem_global_init+0x1fe/0x2b0 [ttm]
> >>> [54758.418361] Code: 00 00 00 48 89 5d 40 48 89 ab a0 00 00 00 e8 79 ba 95 f8 85 c0 0f 85 a2 00 00 00 8b 83 90 00 00 00 8d 50 01 89 93 90 00 00 00 <48> 89 ac c3 80 00 00 00 85 d2 0f 85 c6 99 00 00 48 8b 83 98 00 00
> >>> [54758.418363] RSP: 0018:ffffc24e53c1f988 EFLAGS: 00010246
> >>> [54758.418364] RAX: 00000000a5a2f300 RBX: ffffffffc101ae80 RCX: 0000000000000000
> >>> [54758.418366] RDX: 00000000a5a2f301 RSI: 0000000000000000 RDI: ffff9ebe78b67330
> >>> [54758.418367] RBP: ffff9ebe42c79d00 R08: 0000000000000044 R09: 0000000000000228
> >>> [54758.418368] R10: 0000000000000000 R11: ffff9ebb62720bb8 R12: 0000000000000000
> >>> [54758.418370] R13: ffff9ebe27d71640 R14: ffff9eba9f258600 R15: ffffffffc1453880
> >>> [54758.418371] FS: 00007fc0b379c280(0000) GS:ffff9ebeae380000(0000) knlGS:0000000000000000
> >>> [54758.418373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [54758.418374] CR2: 00000004ee194700 CR3: 000000045069e005 CR4: 00000000003606e0
> >>> [54758.418375] Call Trace:
> >>> [54758.418381] ttm_bo_device_init+0x198/0x2a0 [ttm]
> >>> [54758.418439] nouveau_ttm_init+0xbf/0x340 [nouveau]
> >>> [54758.418495] nouveau_drm_device_init+0x125/0x7d0 [nouveau]
> >>> [54758.418499] ? pci_bus_read_config_word+0x49/0x70
> >>> [54758.418552] nouveau_drm_probe+0x26f/0x2c0 [nouveau]
> >>> [54758.418556] local_pci_probe+0x41/0x90
> >>> [54758.418558] pci_device_probe+0x118/0x1a0
> >>> [54758.418561] really_probe+0xf8/0x3b0
> >>> [54758.418563] driver_probe_device+0xb3/0xf0
> >>> [54758.418565] __driver_attach+0xdd/0x110
> >>> [54758.418567] ? driver_probe_device+0xf0/0xf0
> >>> [54758.418570] bus_for_each_dev+0x77/0xc0
> >>> [54758.418573] ? klist_add_tail+0x3b/0x60
> >>> [54758.418574] bus_add_driver+0x152/0x230
> >>> [54758.418576] ? 0xffffffffc1027000
> >>> [54758.418578] driver_register+0x6b/0xb0
> >>> [54758.418580] ? 0xffffffffc1027000
> >>> [54758.418583] do_one_initcall+0x46/0x1c3
> >>> [54758.418585] ? _cond_resched+0x15/0x30
> >>> [54758.418587] ? kmem_cache_alloc_trace+0x154/0x1d0
> >>> [54758.418591] do_init_module+0x5a/0x210
> >>> [54758.418593] load_module+0x2096/0x22d0
> >>> [54758.418596] ? ima_post_read_file+0xf4/0x100
> >>> [54758.418599] ? __do_sys_finit_module+0xa8/0x110
> >>> [54758.418601] __do_sys_finit_module+0xa8/0x110
> >>> [54758.418604] do_syscall_64+0x5b/0x160
> >>> [54758.418607] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [54758.418609] RIP: 0033:0x7fc0b38b6edd
> >>> [54758.418610] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7b 7f 0c 00 f7 d8 64 89 01 48
> >>> [54758.418612] RSP: 002b:00007ffc7f1620d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> >>> [54758.418614] RAX: ffffffffffffffda RBX: 00005629a3996ac0 RCX: 00007fc0b38b6edd
> >>> [54758.418615] RDX: 0000000000000000 RSI: 00005629a3996260 RDI: 0000000000000003
> >>> [54758.418616] RBP: 00005629a3996260 R08: 0000000000000000 R09: 0000000000000000
> >>> [54758.418618] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
> >>> [54758.418619] R13: 00005629a3996a80 R14: 0000000000000000 R15: 00005629a3996260
> >>> [54758.418620] Modules linked in: nouveau(OE+) acpi_call(OE) ttm rfcomm ccm xt_CHECKSUM ipt_MASQUERADE tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc vfat fat btusb btrtl btbcm btintel uvcvideo videobuf2_vmalloc bluetooth videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media ecdh_generic arc4 snd_hda_codec_hdmi ath10k_pci ath10k_core snd_hda_codec_realtek iTCO_wdt mei_wdt iTCO_vendor_support snd_hda_codec_generic mac80211 intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec dell_laptop ledtrig_audio dell_smm_hwmon snd_hda_core kvm_intel snd_hwdep ath snd_seq snd_seq_device intel_cstate cfg80211 snd_pcm dell_wmi intel_uncore
> >>> [54758.418653] dell_smbios dcdbas intel_rapl_perf snd_timer joydev dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof snd idma64 soundcore i2c_i801 rfkill rtsx_pci_ms memstick mei_me mei processor_thermal_device intel_lpss_pci intel_soc_dts_iosf intel_pch_thermal intel_lpss int3400_thermal acpi_pad acpi_thermal_rel int3403_thermal intel_hid pcc_cpufreq int340x_thermal_zone sparse_keymap binfmt_misc zram dm_crypt hid_multitouch i915 kvmgt mdev vfio kvm irqbypass i2c_algo_bit crct10dif_pclmul drm_kms_helper rtsx_pci_sdmmc crc32_pclmul mmc_core crc32c_intel mxm_wmi drm nvme ghash_clmulni_intel nvme_core serio_raw rtsx_pci i2c_hid video wmi i2c_dev lz4 lz4_compress [last unloaded: nouveau]
> >>> [54758.418679] CR2: 00000004ee194700
> >>> [54758.418681] ---[ end trace c5175234e6efc034 ]---
> >>> [54758.418686] RIP: 0010:ttm_mem_global_init+0x1fe/0x2b0 [ttm]
> >>> [54758.418688] Code: 00 00 00 48 89 5d 40 48 89 ab a0 00 00 00 e8 79 ba 95 f8 85 c0 0f 85 a2 00 00 00 8b 83 90 00 00 00 8d 50 01 89 93 90 00 00 00 <48> 89 ac c3 80 00 00 00 85 d2 0f 85 c6 99 00 00 48 8b 83 98 00 00
> >>> [54758.418689] RSP: 0018:ffffc24e53c1f988 EFLAGS: 00010246
> >>> [54758.418690] RAX: 00000000a5a2f300 RBX: ffffffffc101ae80 RCX: 0000000000000000
> >>> [54758.418691] RDX: 00000000a5a2f301 RSI: 0000000000000000 RDI: ffff9ebe78b67330
> >>> [54758.418692] RBP: ffff9ebe42c79d00 R08: 0000000000000044 R09: 0000000000000228
> >>> [54758.418694] R10: 0000000000000000 R11: ffff9ebb62720bb8 R12: 0000000000000000
> >>> [54758.418695] R13: ffff9ebe27d71640 R14: ffff9eba9f258600 R15: ffffffffc1453880
> >>> [54758.418696] FS: 00007fc0b379c280(0000) GS:ffff9ebeae380000(0000) knlGS:0000000000000000
> >>> [54758.418698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [54758.418699] CR2: 00000004ee194700 CR3: 000000045069e005 CR4: 00000000003606e0
> >>>
> >>> Karol Herbst (6):
> >>> Revert "drm: Remove drm_global.{c,h} v2"
> >>> Revert "drm/ttm: initialize globals during device init (v2)"
> >>> Revert "drm/ttm: Fix bo_global and mem_global kfree error"
> >>> Revert "drm/ttm: use a static ttm_bo_global instance"
> >>> Revert "drm/ttm: make the device list mutex static"
> >>> Revert "drm/ttm: use a static ttm_mem_global instance"
> >>>
> >>> drivers/gpu/drm/Makefile | 2 +-
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 103 ++++++++++++-
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +
> >>> drivers/gpu/drm/ast/ast_drv.h | 2 +
> >>> drivers/gpu/drm/ast/ast_ttm.c | 64 ++++++++
> >>> drivers/gpu/drm/bochs/bochs.h | 2 +
> >>> drivers/gpu/drm/bochs/bochs_mm.c | 61 ++++++++
> >>> drivers/gpu/drm/cirrus/cirrus_drv.h | 2 +
> >>> drivers/gpu/drm/cirrus/cirrus_ttm.c | 64 ++++++++
> >>> drivers/gpu/drm/drm_drv.c | 2 +
> >>> drivers/gpu/drm/drm_global.c | 137 ++++++++++++++++++
> >>> .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h | 2 +
> >>> drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c | 57 ++++++++
> >>> drivers/gpu/drm/mgag200/mgag200_drv.h | 2 +
> >>> drivers/gpu/drm/mgag200/mgag200_ttm.c | 64 ++++++++
> >>> drivers/gpu/drm/nouveau/nouveau_drv.h | 2 +
> >>> drivers/gpu/drm/nouveau/nouveau_ttm.c | 67 +++++++++
> >>> drivers/gpu/drm/qxl/qxl_drv.h | 3 +
> >>> drivers/gpu/drm/qxl/qxl_ttm.c | 61 ++++++++
> >>> drivers/gpu/drm/radeon/radeon.h | 3 +
> >>> drivers/gpu/drm/radeon/radeon_ttm.c | 65 +++++++++
> >>> drivers/gpu/drm/ttm/ttm_bo.c | 67 +++------
> >>> drivers/gpu/drm/ttm/ttm_memory.c | 14 +-
> >>> drivers/gpu/drm/virtio/virtgpu_drv.h | 3 +
> >>> drivers/gpu/drm/virtio/virtgpu_ttm.c | 62 ++++++++
> >>> drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 15 +-
> >>> drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 6 +-
> >>> drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 54 +++++++
> >>> drivers/staging/vboxvideo/vbox_drv.h | 2 +
> >>> drivers/staging/vboxvideo/vbox_ttm.c | 65 ++++++++-
> >>> include/drm/drmP.h | 1 +
> >>> include/drm/drm_global.h | 53 +++++++
> >>> include/drm/ttm/ttm_bo_driver.h | 54 ++++++-
> >>> include/drm/ttm/ttm_memory.h | 4 +-
> >>> 34 files changed, 1102 insertions(+), 65 deletions(-)
> >>> create mode 100644 drivers/gpu/drm/drm_global.c
> >>> create mode 100644 include/drm/drm_global.h
> >>>
> >>
>
>
More information about the dri-devel
mailing list